Penalty test data amplification method based on multi-objective optimization

文档序号:971433 发布日期:2020-11-03 浏览:4次 中文

阅读说明:本技术 一种基于多目标优化的刑罚测试数据扩增方法 (Penalty test data amplification method based on multi-objective optimization ) 是由 夏春艳 张岩 李明 于 2020-08-01 设计创作,主要内容包括:一种基于多目标优化的刑罚测试数据扩增方法,其特征在于将优化技术应用到司法刑罚预测的测试数据扩增中,通过优化技术与数据扩增方法的融合,为司法智能软件扩增泛化能力较强的测试数据,有效地测试刑罚预测模型的准确性。方法主要分为两个步骤,第一个步骤为测试数据扩增,在刑罚预测模型原始测试集的基础上,通过置换、插入和删除文本中句子的方式获得大量的具有相同标签的扩增数据;第二个步骤是测试数据优化,以测试数据的精确率、召回率和F1值,以及针对刑罚预测模型测试数据的重要度为优化目标,利用多目标遗传算法的选择、交叉和变异操作,从扩增的大量数据中搜索高质量的测试数据,从而增加扩增测试数据的数量和多样性,提高扩增测试数据的泛化能力。本发明可以基于司法本文数据为刑罚预测模型扩增测试数据,解决缺少司法测试数据的问题,保障智能软件测试的质量。(A penalty test data amplification method based on multi-objective optimization is characterized in that an optimization technology is applied to test data amplification of judicial penalty prediction, test data with strong generalization capability is amplified for judicial intelligent software through the fusion of the optimization technology and a data amplification method, and the accuracy of a penalty prediction model is effectively tested. The method mainly comprises two steps, wherein the first step is test data amplification, and a large amount of amplification data with the same labels are obtained by replacing, inserting and deleting sentences in a text on the basis of an original test set of a penalty prediction model; the second step is test data optimization, which takes the precision rate, the recall rate and the F1 value of the test data and the importance degree of the test data aiming at the penalty prediction model as optimization targets, and utilizes the selection, the intersection and the variation operation of the multi-target genetic algorithm to search high-quality test data from the amplified mass data, thereby increasing the quantity and the diversity of the amplified test data and improving the generalization capability of the amplified test data. The method can amplify test data for the penalty prediction model based on judicial text data, solves the problem of lacking judicial test data, and guarantees the quality of intelligent software testing.)

1. A penalty test data amplification method based on multi-objective optimization is characterized in that a multi-objective optimization technology based on a genetic algorithm is applied to judicial penalty test data amplification, test data with strong generalization capability is amplified for judicial intelligent software through the fusion of the multi-objective optimization technology and a data amplification method, and the accuracy of a penalty prediction model is effectively tested, and the method comprises the following steps:

1) test data amplification, namely obtaining a large amount of amplification data with the same label by replacing, inserting and deleting sentences in a text on the basis of an original test set of a penalty prediction model; firstly, inputting judicial official document text data with good structure as an original test data setD 0 According to the label 'case guilty name' para-textThe test data is classified, the label contents of 'case facts' and 'criminal period results' of each case are extracted, preliminary text preprocessing is conducted on the 'case facts', simple mathematical statistics is conducted on the 'criminal period results', and the cases are divided into three categories: death, no-term apprehension and futuristic; then, for cases with the same label of 'case criminal name', taking the whole sentence in the 'case fact' text as a basic unit, and carrying out the three modes of sentence scrambling, sentence deleting and sentence insertingD 0 Performing amplification to obtain three new data sets with the same size as the original data set, and mixing to obtain amplified data setD

2) Optimizing test data, namely searching high-quality test data from a large amount of amplified data by taking the accuracy, recall rate and F1 value of the test data and the importance of the test data aiming at a penalty prediction model as optimization targets through selection, intersection and variation operations of a genetic algorithm, so that the quantity and diversity of the amplified test data are increased, and the generalization capability of the amplified test data is improved; first, to amplify a data setDThe serial numbers corresponding to the case data are used as input data, all the data are arranged completely and randomly selectednConstructing an initial population by using the permutation sequences, wherein each individual contains the number of test cases asm(ii) a The individual adopts a decimal coding mode, and the basic gene position represents the corresponding test case serial number; secondly, injecting the population data into a penalty prediction model to obtain the accuracy, the recall rate and the F1 value of the test data; for cases with the same case criminal name label, counting the frequency of three types of criminal penalty data of dead, untimely and futile in each type of case in individuals and populations, and comprehensively evaluating the importance degree of the individuals in which the cases are located; thirdly, with the accuracy, the recall rate, the F1 value and the importance of the test data as optimization targets, searching high-quality test data by utilizing the selection, the intersection and the variation operation of the multi-target genetic algorithm to obtain an amplification test data setD’(ii) a Fourth, the amplification test data set obtained by the designD’And injecting the amplified sample into a penalty prediction model for testing, and calculating the accuracy of the model so as to test the generalization capability of the amplification test data obtained by the design method.

2. The genetic algorithm-based penalizing predictive test data amplification method according to claim 1, wherein in step 1), the test data is amplified; first, input the original test data setD 0 ={d 1 ,d 2 ,…,d t },d t Is shown astThe test data is judicial official document text data with a good structure and mainly comprises case criminal names, case transmission time, case transmission places, case facts, criminal penalty results and other information; secondly, classifying case text test data according to the name of case guilt, and extracting the case facts of each case and the label content of penalty results; thirdly, performing preliminary text preprocessing on the 'case facts', and removing meaningless stop words in the text according to a common stop word list; simple mathematical statistics is carried out on the penalty results, and the penalty results are divided into three categories: death, no-term apprehension and futuristic; fourthly, for cases with the same label of 'case criminal name', aiming at sentences in 'case fact' text, the cases are paired in three modes of scrambling sentences, deleting sentences and inserting sentencesD 0 Amplification was performed as follows:

scrambling, the design uses the complete sentence in the text as the basic unit, because the sentence sequence has little influence on the text meaning of the description fact, the sequence of the sentences in the original text is randomly scrambled, and the amplification data set with the same label as the original text data and the same scale can be obtainedD 1 ={d 11 ,d 12 ,…,d 1t };

Deleting, because the text describing the fact contains a plurality of redundant sentences which have little influence on the meaning of the text description and can not influence the understanding of the cases by deleting the sentences, randomly deleting one sentence in the original text by adopting a random method; if the original text only contains one sentence, no processing is carried out; the same deletion operation is executed on each text, and the same data as the original text can be obtainedAmplified data sets of tags of the same sizeD 2 ={d 21 ,d 22 ,…,d 2t };

Inserting, because cases with the same case-criminal name have a plurality of similar sentences when describing texts, dividing the text data with the same case-criminal name label into one type; selecting a sentence from another text with the same label, and randomly inserting the sentence into the original text data to obtain an amplified data set with the same label as the original text dataD 3 ={d 31 ,d 32 ,…,d 3t };

By the three data amplification methods, three new data sets with the same scale as the original data set can be obtained and mixed to obtain an amplification data setD=D 1 D 2 D 3

3. The genetic algorithm-based penalizing predictive test data amplification method according to claim 1, characterized in that in step 2), the test data is optimized;

1) augmenting datasets with structured text classesDThe serial numbers corresponding to the case data are used as input data, all the data are arranged completely and randomly selectednThe array sequences construct an initial population, individuals adopt a decimal coding mode, and basic gene positions represent corresponding test case serial numbers; wherein the initial population isx={x 1 ,x 2 ,…,x i ,…,x n },nIn order to be of the population scale,xto (1) aiThe individual isx i = {x i,1 ,x i,2 ,…,x i,j ,…,x i,m },x i,j Representsx i To (1) ajIn the case of one of the test cases,mis composed ofx i The number of test cases involved;

2) injecting the population data into a penalty prediction model to obtain the accuracy, recall rate and F1 value of the test data; for cases with the same case criminal name label, counting the frequency of three types of criminal penalty data of dead, untimely and futile in each type of case in individuals and populations, and comprehensively evaluating the importance degree of the individuals in which the cases are located; the calculation formula is as follows:

the accuracy rate is the probability of actually being the correct sample in the samples predicted to be correct by the model, and the expression is

Figure RE-125725DEST_PATH_IMAGE001

The recall rate is the probability of being predicted as the correct sample by the model in the actually correct sample, and the expression is

Figure RE-120226DEST_PATH_IMAGE002

F1 value comprehensively evaluating the accuracy and the recall rate to simultaneously reach the highest, and obtaining a balance point with an expression of

(3)

Seen on the microscopic level, the utility model,the finger model correctly predictsx i The number of true samples;the finger model correctly predictsx i The number of false samples of (2);the finger model incorrectly predictsx i IsThe number of samples;

the importance degree comprehensively evaluates the importance degree of the individual, is in direct proportion to the frequency of the individual and in inverse proportion to the frequency of the population, and the expression is

(4)

Wherein

(5)

In the above formula, the first and second carbon atoms are,mto representx i Total number of test data contained whenk=1,2,3 times of the total weight of the mixture,the data of three types of characteristics of death sentences, no-term prisoners and term prisoners are respectively shown inx i The number of times of occurrence of (a),

Figure RE-890900DEST_PATH_IMAGE010

3) With the accuracy, the recall rate, the F1 value and the importance of the test data as optimization targets, searching high-quality test data by utilizing the selection, the intersection and the variation operation of a multi-target genetic algorithm to obtain an amplification test data setD’={d 1 ’,d 2 ’,…d m }; the selection operation adopts a championship selection strategy, the cross operation adopts a circular cross method, and the mutation operation adopts a sequence number mutation method, which specifically comprises the following steps:

selecting an operator: adopting a championship selection strategy; each time slave populationnIn a randomly selected number ofn2, then, obtaining a pareto optimal solution by adopting a non-dominated sorting algorithm, and selecting an optimal individual from the pareto optimal solution to enter a filial generation population; repeating the above operations untilNew population size up tonUntil the end;

and (3) a crossover operator: adopting a circular crossing method; step one, randomly selecting a gene on a parent 1, then finding the gene number on the corresponding position of the parent 2, then returning to the parent 1 to find the gene position with the same number, repeating the previous work until a ring is formed, wherein the positions of all the genes in the ring are the finally selected positions; secondly, generating offspring by using the gene selected in the parent 1, and ensuring the position to correspond; thirdly, putting the residual genes in the parent 2 into offspring;

mutation operator: a sequence number variation method is adopted; randomly selecting a gene position of a parent individual, deleting the test case at the point, and then randomly inserting a test case which is not repeated with the existing gene in the current individual to form a new offspring individual;

4) injecting the amplification test data set obtained by the design into a penalty prediction model for testing, and calculating the accuracy of the model so as to test the generalization capability of the amplification test data obtained by the design method; wherein, the accuracy rate is a general index for evaluating the performance of the deep learning model, and refers to the ratio of the number of samples correctly classified by the model to the total number of samples for a given test data set, and the expression is

(6)

Seen on the microscopic level, the utility model,

Figure RE-908721DEST_PATH_IMAGE012

Technical Field

The invention belongs to the field of intelligent software testing, and also relates to deep learning and natural language processing technologies, aiming at amplifying test data with strong generalization capability for intelligent software, and being a test data amplification method based on judicial texts, and being capable of effectively testing the performance of penalty prediction models.

Background

With the advance of the informatization 3.0 construction of the court and the rapid development of the machine learning technology, judicial data which can be stored and processed by a computer rapidly increases, and judicial authorities introduce deep learning methods into the field of legal services at various times, thereby realizing the intellectualization of judicial services. In recent years, research in the judicial field of intelligent penal forecasting software using judicial texts such as referee documents and case facts as input has achieved certain results, such as automatic criminal forecasting and criminal name forecasting. In order to improve the accuracy of the penalty prediction model, researchers are increasing the parameters of the model and enlarging the scale of the training set, which puts new requirements on the data set for testing the performance of the model, namely the quantity and diversity of the test data. In fact, it is difficult to obtain a large amount of label data in the field of penalty prediction. Under the condition, the original test data is slightly transformed by using a data amplification means on the basis of not actually increasing the original test data, so that more test data can be obtained, and the method can effectively obtain a large amount of test data with the same label for a penalty prediction model.

With the development of text analysis technology and the continuous emergence of data amplification methods, the obtained judicial text amplification test data is rapidly increased, and therefore, it is necessary to rely on optimization technology to screen high-quality test data capable of testing model performance from the test data. The genetic algorithm is a heuristic search algorithm, is a self-adaptive global optimization probability search algorithm formed by simulating the heredity and evolution processes of organisms in natural environment, has the main aim of quickly seeking the optimal solution or the approximate optimal solution of a problem in a search space, and is successfully applied to the optimization field. In view of this, the optimization technology is fused with the data amplification method, and the problem of searching high-quality test data from a large amount of amplification data is converted into a multi-objective optimization problem based on a genetic algorithm, so that the number and diversity of test data of the penalty prediction model are increased, and the purpose is to provide the amplification test data with better generalization capability for the prediction model obtained by deep learning.

Disclosure of Invention

According to the method, text data amplification is carried out on judicial official documents, a large amount of amplification test data is obtained, a multi-objective optimization technology based on a genetic algorithm is fused, high-quality test data suitable for penalty prediction are automatically searched from the amplification data, the amplification test data with high generalization capability is provided for a penalty prediction model, and the requirements of the model on the quantity and diversity of the test data are met.

In order to achieve the above object, the present invention provides a penalty test data amplification method based on multi-objective optimization, which first obtains a large amount of amplification data with the same tags by replacing, inserting and deleting sentences in a text based on an original test set of a penalty prediction model. Then, with the accuracy, recall rate and F1 value of the test data and the importance of the test data aiming at the penalty prediction model as optimization targets, the genetic algorithm is utilized to search high-quality test data from the amplified mass data, so that the quantity and diversity of the amplified test data are increased, and the generalization capability of the amplified test data is improved. The method is characterized by comprising the following steps.

Step 1. test data amplification

(1) First, an original test data set is inputD 0 ={d 1 ,d 2 ,…,d t },d t Is shown astThe test data is judicial official document text data with a good structure and mainly comprises case criminal names (accounting), case time (Date), case places (Address), case facts (Fact), Penalty results (Penalty) and other information. Secondly, case text test data are classified according to case criminal names (accounting), and the label content of case facts (Fact) and Penalty results (Penalty) of each case is extracted. Then, performing preliminary text preprocessing on case facts (Fact), and removing meaningless stop words in the text according to a common stop word list; make a Penalty result (Penalty)Simple mathematical statistics, which are divided into three categories: death (Death _ penalty), no-term apprehension (Life _ expression), and apprehension-term (expression).

(2) In order to not change the meaning of the original text, for cases with the same label of case criminal name (accounting), the sentences in the text of case Fact (Fact) are scrambled, deleted and insertedD 0 Amplification was performed as follows: scrambling sentences, randomly scrambling the sequence of sentences in the original text to obtain an amplification data set with the same labelD 1 ={d 11 ,d 12 ,…, d 1t }; deleting sentences, and randomly deleting one sentence in the original text to obtain the amplification data with the same labelD 2 ={d 21 ,d 22 ,…, d 2t }; inserting sentences, selecting one sentence from other texts with the same label to randomly insert into the original text to obtain an amplified data set with the same labelD 3 ={d 31 ,d 32 ,…,d 3t }. By the three data amplification methods, three new data sets with the same scale as the original data set can be obtained and mixed to obtain an amplification data setD=D 1 D 2 D 3

Step 2, optimizing test data

(1) Augmenting datasets with structured text classesDThe serial numbers corresponding to the case data are used as input data, all the data are arranged completely and randomly selectednThe initial population is constructed by the permutation sequences, the individuals adopt a decimal coding mode, and the basic gene position represents the corresponding test case serial number. Wherein the initial population isx={x 1 ,x 2 ,…,x i ,…,x n },nIn order to be of the population scale,xto (1) aiThe individual isx i = {x i,1 ,x i,2 ,…,x i,j ,…,x i,m },x i,j Representsx i To (1) ajIn the case of one of the test cases,mis composed ofx i Number of test cases involved.

(2) Injecting the population data into a penalty prediction model to obtain the accuracy, recall rate and F1 value of the test data; for cases with the same case criminal (administration) label, counting the frequency of occurrence of three types of penalty data, namely Death _ penalty, no-term apprehension (Life _ Imprisonment) and futures apprehension (Imprisonment) in each type of case in individuals and populations, and comprehensively evaluating the importance degree of the individuals in which cases are locatedImportant(x i )。

(3) With the accuracy, the recall rate, the F1 value and the importance of the test data as optimization targets, searching high-quality test data by utilizing the selection, the intersection and the variation operation of a multi-target genetic algorithm to obtain an amplification test data setD’={d 1 ’,d 2 ’,… d m }. The selection operation adopts a championship selection strategy, the cross operation adopts a circular cross method, and the mutation operation adopts a serial number mutation method.

(4) And injecting the amplification test data set obtained by the design into a penalty prediction model for testing, and calculating the accuracy of the model so as to test the generalization capability of the amplification test data obtained by the design method.

Drawings

FIG. 1 is a flow chart of the penalty test data amplification method based on multi-objective optimization.

FIG. 2 is a flow chart of the test data amplification of FIG. 1.

FIG. 3 is a flow chart of test data optimization in FIG. 1.

Detailed Description

In order to clearly understand the technical contents of the present invention, specific embodiments are described below with reference to the accompanying drawings.

FIG. 1 is a flow chart of a penalty test data amplification method based on multi-objective optimization, which is implemented by the invention.

A penalty test data amplification method based on multi-objective optimization is characterized by comprising the following steps:

s1 test data augmentation by first inputting judicial official document data with good structure as the original test data setD 0 Classifying text test data according to a label 'case and criminal name', extracting label contents of 'case facts' and 'criminal period results' of each case, performing preliminary text preprocessing on the 'case facts', and performing simple mathematical statistics on the 'criminal period results', wherein the classification is divided into three types: dead, futile and futile. Then, for cases with the same label of 'case criminal name', taking the whole sentence in the 'case fact' text as a basic unit, and carrying out the three modes of sentence scrambling, sentence deleting and sentence insertingD 0 Performing amplification to obtain three new data sets with the same size as the original data set, and mixing to obtain amplified data setD=D 1 D 2 D 3

S2 test data optimization, first, to amplify data setsDThe serial number corresponding to the middle test data is used as input data, all the data are arranged completely and randomly selectednConstructing an initial population by using the permutation sequences, wherein each individual contains the number of test cases asm. The individual adopts a decimal coding mode, and the basic gene position represents the corresponding test case serial number. Secondly, injecting the population data into a penalty prediction model to obtain the accuracy, the recall rate and the F1 value of the test data; for cases with the same case criminal (administration) label, counting the frequency of occurrence of three types of penalty data, namely Death _ penalty, no-term apprehension (Life _ Imprisonment) and futures apprehension (Imprisonment) in each type of case in individuals and populations, and comprehensively evaluating the importance degree of the individuals in which cases are locatedImportant(x i ). Thirdly, the accuracy, the recall rate, the F1 value and the importance of the test data are taken as optimization targets, and the selection, the intersection and the variation operation of the multi-target genetic algorithm are utilized to search the high-quality testData, obtaining an amplified test data setD’. The selection operation adopts a championship selection strategy, the cross operation adopts a circular cross method, and the mutation operation adopts a serial number mutation method. Fourth, the amplification test data set obtained by the designD’And injecting the amplified sample into a penalty prediction model for testing, and calculating the accuracy of the model so as to test the generalization capability of the amplification test data obtained by the design method.

FIG. 2 is a flow chart of the amplification of the test data of S1, and the detailed implementation method of S1 is described in conjunction with FIG. 2 as follows: first, input the original test data setD 0 ={d 1 ,d 2 ,…,d t },d t Is shown astThe test data is judicial official document text data with a good structure and mainly comprises case criminal names (accounting), case time (Date), case places (Address), case facts (Fact), Penalty results (Penalty) and other information. Secondly, case text test data are classified according to case criminal names (accounting), and the label content of case facts (Fact) and Penalty results (Penalty) of each case is extracted. Thirdly, performing preliminary text preprocessing on case facts (Fact), and removing meaningless stop words in the text according to a common stop word list; simple mathematical statistics are carried out on the Penalty results (Penalty), and the Penalty results are divided into three categories: death (Death _ penalty), no-term apprehension (Life _ expression), and apprehension-term (expression). Fourthly, for cases with the same label of case criminal name (accounting), the sentences in the case Fact (Fact) text are paired in three ways of scrambling the sentences, deleting the sentences and inserting the sentencesD 0 Amplification was performed as follows:

1) scrambling, the design uses the complete sentence in the text as the basic unit, because the sentence sequence has little influence on the text meaning of the description fact, the sequence of the sentences in the original text is randomly scrambled, and the amplification data set with the same label as the original text data and the same scale can be obtainedD 1 ={d 11 ,d 12 ,…,d 1t }。

2) And deleting, namely deleting one sentence in the original text randomly by adopting a random method because the text describing the fact contains a plurality of redundant sentences which have little influence on the meaning of the text description and the deletion of the sentences does not influence the understanding of the cases. If the original text only contains one sentence, no processing is performed. The same deletion operation is performed on each text, resulting in an augmented dataset of the same size and with the same label as the original text dataD 2 ={d 21 ,d 22 ,…,d 2t }。

3) In case of case-crime, text data with the same case-crime label is classified into one category since cases with the same case-crime name have many similar sentences in describing the text. Selecting a sentence from another text with the same label, and randomly inserting the sentence into the original text data to obtain an amplified data set with the same label as the original text dataD 3 ={d 31 ,d 32 ,…,d 3t }。

Data amplification is a very efficient method to increase the size of a data set. By the three data amplification methods, three new data sets with the same scale as the original data set can be obtained and mixed to obtain an amplification data setD=D 1 D 2 D 3

Further, the specific steps of the amplification of the test data of S1 are as follows:

step S1-1: an initial state;

step S1-2: inputting referee text data setD 0 ={d 1 ,d 2 ,…,d t And (c) the step of (c) in which,d t is shown astCase data;

step S1-3: classifying the test data by case criminal names (accounting), and extracting case facts (Fact) and Penalty results (Penalty) according to labels;

step S1-4: text preprocessing, counting Penalty results (Penalty) and classifying: death (Death _ penalty), no-term apprehension (Life _ expression), and apprehension-term (expression);

step S1-5: randomly scrambling the sequence of sentences in the original text to obtain an amplified data set with the same size and the same labels as the original text dataD 1 ={d 11 ,d 12 ,…,d 1t };

Step S1-6: randomly deleting a sentence in the original text to obtain an amplified data set with the same size and the same label as the original text dataD 2 ={d 21 ,d 22 ,…,d 2t };

Step S1-7: selecting a sentence from another text with the same label, and randomly inserting the sentence into the original text data to obtain an amplified data set with the same label as the original text data and the same scaleD 3 ={d 31 ,d 32 ,…,d 3t };

Step S1-8: outputting the amplified datasetD=D 1 D 2 D 3

Step S1-9: and ending the state.

Fig. 3 is a flowchart of test data optimization, and the specific implementation method of S2 is described as follows in conjunction with fig. 3.

(1) Augmenting datasets with structured text classesDThe serial numbers corresponding to the case data are used as input data, all the data are arranged completely and randomly selectednThe initial population is constructed by the permutation sequences, the individuals adopt a decimal coding mode, and the basic gene position represents the corresponding test case serial number. Wherein the initial population isx={x 1 ,x 2 ,…,x i ,…,x n },nIn order to be of the population scale,xto (1) aiThe individual isx i = {x i,1 ,x i,2 ,…,x i,j ,…,x i,m },x i,j Representsx i To (1) ajIn the case of one of the test cases,mis composed ofx i Number of test cases involved.

(2) Injecting the population data into a penalty prediction model to obtain the accuracy, recall rate and F1 value of the test data, wherein the calculation formula is as follows:

the precision ratio is as follows: the probability that the sample is actually the correct sample among the samples predicted to be correct by the model is expressed as

(1)

The recall ratio is as follows: in the actual correct sample, the probability of being predicted as the correct sample by the model is expressed as

Figure RE-772850DEST_PATH_IMAGE002

(2)

F1 value: comprehensively evaluating the accuracy and the recall rate to simultaneously reach the highest, and obtaining a balance point with an expression of

Figure RE-818166DEST_PATH_IMAGE003

(3)

Seen on the microscopic level, the utility model,

Figure RE-72430DEST_PATH_IMAGE004

the finger model correctly predictsx i The number of true samples;

Figure RE-824485DEST_PATH_IMAGE005

the finger model correctly predictsx i The number of false samples of (2);

Figure RE-912527DEST_PATH_IMAGE006

the finger model incorrectly predictsx i The number of true samples.

For cases with the same case criminal (administration) label, counting the frequency of occurrence of three types of penalty data, namely Death _ penalty, no-term apprehension (Life _ Imprisonment) and futures apprehension (Imprisonment) in each type of case in individuals and populations, and comprehensively evaluating the importance degree of the individuals in which cases are locatedImportant(x i ) The calculation formula is as follows:

(4)

wherein

Figure RE-920071DEST_PATH_IMAGE008

(5)

In the above formula, the first and second carbon atoms are,mto representx i Total number of test data contained whenk=1,2,3 times of the total weight of the mixture,the data of three types of characteristics of death sentences, no-term prisoners and term prisoners are respectively shown inx i The number of times of occurrence of (a),the data of three types of characteristics of death sentences, no-term prisoners and term prisoners are respectively shown inx i Distribution of (2).

(3) With the accuracy, the recall rate, the F1 value and the importance of the test data as optimization targets, searching high-quality test data by utilizing the selection, the intersection and the variation operation of a multi-target genetic algorithm to obtain an amplification test data setD’={d 1 ’,d 2 ’,… d m }. The selection operation adopts a championship selection strategy, the cross operation adopts a circular cross method, and the mutation operation adopts a sequence number mutation method, which specifically comprises the following steps:

selecting an operator: a tournament selection strategy is employed. Each time slave populationnIn a randomly selected number ofnAnd 2, then, solving a pareto optimal solution by adopting a non-dominated sorting algorithm, and selecting the optimal individual from the pareto optimal solution to enter a filial generation population. Repeating the above operations until the new population size is reachednUntil the end;

and (3) a crossover operator: a cyclic crossover method is used. Step one, randomly selecting a gene on a parent 1, then finding the gene number on the corresponding position of the parent 2, then returning to the parent 1 to find the gene position with the same number, repeating the previous work until a ring is formed, wherein the positions of all the genes in the ring are the finally selected positions; secondly, generating offspring by using the gene selected in the parent 1, and ensuring the position to correspond; thirdly, putting the residual genes in the parent 2 into offspring;

mutation operator: by adopting a sequence number variation method, a gene position of a parent individual is randomly selected, the test case at the point is deleted, and then a test case which is not repeated with the existing gene in the current individual is randomly inserted to form a new offspring individual.

(4) And injecting the amplification test data set obtained by the design into a penalty prediction model for testing, and calculating the accuracy of the model so as to test the generalization capability of the amplification test data obtained by the design method. Wherein, Accuracy (Accuracy) is a general index for evaluating the performance of the deep learning model, and refers to the ratio of the number of samples correctly classified by the model to the total number of samples for a given test data set, and the expression is

(6)

Seen on the microscopic level, the utility model,the finger model correctly predictsx i The number of true samples;

Figure RE-347773DEST_PATH_IMAGE012

means that the model is correctly predictedMeasure and makex i The number of false samples of (2);

Figure RE-715300DEST_PATH_IMAGE006

the finger model incorrectly predictsx i The number of true samples;the finger model incorrectly predictsThe number of false samples.

Further, the specific steps of optimizing the test data of the S2 are as follows:

step S2-1: an initial state;

step S2-2: inputting an amplification datasetD=D 1 D 2

Figure RE-688569DEST_PATH_IMAGE015

Step S2-3: randomly selecting test data sequence, and constructing initial population of genetic algorithmx={x 1 ,x 2 ,…,x i ,…,x n },nIn order to be of the population scale,xto (1) aiThe individual isx i = {x i,1 ,x i,2 ,…,x i,j ,…,x i,m },x i,j Representsx i To (1) ajIn the case of one of the test cases,mis composed ofx i The number of test cases involved;

step S2-4: injecting the test data into a penalty prediction model, and calculating the accuracy, the recall rate and the F1 value according to the statistical data;

step S2-5: for cases with the same case criminal name (accounting) label, counting the frequency of occurrence of three types of penalty data, namely Death _ penalty, no-term apprehension (Life _ Imprisonment) and futures apprehension (Imprisonment), in each type of case in individuals and populations;

step S2-6: comprehensively evaluating the importance degree of the individuals according to an importance degree formulaImportant(x i );

Step S2-7: with the accuracy, the recall rate, the F1 value and the importance of the test data as optimization targets, searching high-quality test data by utilizing the selection, the intersection and the variation operation of a multi-target genetic algorithm to obtain an amplification test data setD’={d 1 ’,d 2 ’,… d m };

Step S2-8: amplification test data set for outputting penalty prediction modelD’

Step S2-9: augmenting the test data setD’Injecting a penalty prediction model, and calculating the accuracy of the model;

step S2-10: and ending the state.

In conclusion, the method effectively solves the problem that the software quality is difficult to guarantee under the condition that the actual test data of the punishing prediction software based on deep learning is lack, and finally helps software testers to obtain the amplification test data with strong generalization capability and guarantee the quality of intelligent software testing.

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种语音输入标点符号快速切换方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!