Random grouping method, device, computer equipment and storage medium

文档序号:1955266 发布日期:2021-12-10 浏览:17次 中文

阅读说明:本技术 随机分组方法、装置、计算机设备及存储介质 (Random grouping method, device, computer equipment and storage medium ) 是由 文天才 刘保延 何丽云 雒琳 于 2021-08-06 设计创作,主要内容包括:本发明涉及数据统计领域,具体涉及一种随机分组方法、装置、计算机设备及存储介质,包括如下步骤:获取待分组样本;依次将待分组样本加入处理组中,基于每个处理组的定性数据、处理组的数量得到加入有待分组样本的处理组对应的第一综合指标值,基于每个处理组的定量数据、处理组的数量得到加入有待分组样本的处理组对应的第二综合指标值;基于第一综合指标值和第二综合指标值计算得到加入有待分组样本的处理组对应的组间分布差异值,确定出组间分布差异值最小时待分组样本所加入的处理组,并将待分组样本划入处理组中。组间分布差异值能够更加准确地反应各处理组的均衡性,从而使待分组样本所选择加入的处理组更加准确,确保实验的准确性。(The invention relates to the field of data statistics, in particular to a random grouping method, a random grouping device, computer equipment and a storage medium, which comprise the following steps: obtaining a sample to be grouped; sequentially adding samples to be grouped into processing groups, obtaining a first comprehensive index value corresponding to the processing groups added with the samples to be grouped based on the qualitative data and the number of the processing groups of each processing group, and obtaining a second comprehensive index value corresponding to the processing groups added with the samples to be grouped based on the quantitative data and the number of the processing groups of each processing group; and calculating to obtain an interclass distribution difference value corresponding to the processing group added with the sample to be grouped based on the first comprehensive index value and the second comprehensive index value, determining the processing group added with the sample to be grouped when the interclass distribution difference value is minimum, and dividing the sample to be grouped into the processing group. The interclass distribution difference value can reflect the balance of each processing group more accurately, so that the processing group selected and added by the sample to be grouped is more accurate, and the accuracy of the experiment is ensured.)

1. A random grouping method, comprising the steps of:

obtaining a sample to be grouped; wherein the samples to be grouped comprise qualitative data and quantitative data;

sequentially adding the samples to be grouped into processing groups, obtaining a first comprehensive index value corresponding to the processing groups added with the samples to be grouped based on the qualitative data and the number of the processing groups of each processing group, and obtaining a second comprehensive index value corresponding to the processing groups added with the samples to be grouped based on the quantitative data and the number of the processing groups of each processing group;

and calculating to obtain an interclass distribution difference value corresponding to the processing group added with the sample to be grouped based on the first comprehensive index value and the second comprehensive index value, determining the processing group added with the sample to be grouped when the interclass distribution difference value is minimum, and dividing the sample to be grouped into the processing group.

2. The random grouping method of claim 1, wherein the obtaining a first composite index value corresponding to the processing groups to which the samples to be grouped are added based on the qualitative data and the number of the processing groups of each processing group, and obtaining a second composite index value corresponding to the processing groups to which the samples to be grouped are added based on the quantitative data and the number of the processing groups of each processing group comprises:

based on the qualitative data and the number of the processing groups of each processing group, calculating by using a first mathematical model to obtain a first comprehensive index value corresponding to the processing group added with the sample to be grouped;

and calculating to obtain a second comprehensive index value corresponding to the processing group added with the sample to be grouped by utilizing a second mathematical model based on the quantitative data of each processing group and the number of the processing groups.

3. The random grouping method of claim 2, wherein the first mathematical model is:

wherein G represents the number of processing groups, and G represents the G-th processing group; l represents the total number of categories corresponding to the same qualitative data, and L represents the ith category; f. ofaAnd feRespectively representing the observation frequency and the expected frequency on the qualitative factors of the samples to be grouped in the same qualitative data.

4. A random grouping method according to claim 2 or 3, wherein the second mathematical model is:

wherein G represents the number of processing groups, G represents the G-th processing group,andrespectively, the average value of any two treatment groups taken from all the treatment groups on the quantitative factors of the samples to be grouped,andrespectively representing the variance of any two treatment groups taken from all treatment groups on the quantitative factor of the samples to be grouped, n1And n2Respectively, the sample amounts taken from any two treatment groups out of all treatment groups.

5. The random grouping method of claim 3 or 4, wherein the calculating an intergroup distribution difference value based on the first composite indicator value and the second composite indicator value comprises:

and calculating to obtain the interclass distribution difference value by using a third mathematical model based on the first comprehensive index value and the second comprehensive index value.

6. The random grouping method of claim 5, wherein the third mathematical model is:

wherein M represents the number of hierarchical elements, M represents the mth hierarchical element, and wmWeight of the mth hierarchical factor, d when the attribute of the mth hierarchical factor is qualitative datamWhen the attribute of the mth hierarchical factor is quantitative data, dm=dn。

7. The random grouping method according to claim 5 or 6, wherein before the calculating an intergroup distribution difference value based on the first composite index value and the second composite index value, further comprising:

and normalizing the first comprehensive index value and/or the second comprehensive index value.

8. A random grouping apparatus, comprising:

the acquisition module is used for acquiring samples to be grouped; wherein the samples to be grouped comprise qualitative data and quantitative data;

the calculation module is used for sequentially adding the samples to be grouped into the processing groups, obtaining a first comprehensive index value corresponding to the processing groups added with the samples to be grouped based on the qualitative data and the number of the processing groups of each processing group, and obtaining a second comprehensive index value corresponding to the processing groups added with the samples to be grouped based on the quantitative data and the number of the processing groups of each processing group;

and the dividing module is used for calculating to obtain an interclass distribution difference value corresponding to the processing group added with the sample to be grouped based on the first comprehensive index value and the second comprehensive index value, determining the processing group added with the sample to be grouped when the interclass distribution difference value is minimum, and dividing the sample to be grouped into the processing group.

9. A computer device, comprising: a memory and a processor, the memory and the processor being communicatively coupled to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the random grouping method of any of claims 1-7.

10. A computer-readable storage medium storing computer instructions for causing a computer to perform the random grouping method of any one of claims 1 to 7.

Technical Field

The invention relates to the field of data statistics, in particular to a random grouping method, a random grouping device, computer equipment and a storage medium.

Background

Clinical trials refer to the evaluation of the health effects and effects of a medical treatment by a particular population of individuals participating in a clinical study. In order to make clinical trials obtain real and objective conclusions, clinical trials have emphasized the stochastic principle.

The effect of randomization is to distribute the various control factors evenly among the treatment groups, thereby eliminating their effect on the study results. By means of the randomization grouping technology, data entering each treatment group can be kept consistent on the baseline characteristic as much as possible, namely, statistical comparison between groups is achieved, and therefore the effect evaluation of intervention measures is prevented from being influenced by control factors.

In the dynamic randomization algorithm of clinical trials, the conventional minimization method is mainly adopted. However, the conventional minimization method can only directly process the classification variables but not the numerical variables, and only considers the problem of local balance among different processing groups when processing the classification variables, so that the samples cannot be accurately grouped under the condition of processing the classification variables and the numerical variables simultaneously, and the accuracy of the experimental result is finally influenced.

Disclosure of Invention

Therefore, the invention aims to solve the technical problem that the traditional minimization method cannot accurately group samples under the condition that the traditional minimization method has classification variables and numerical variables at the same time, and finally influences the accuracy of an experimental result, thereby providing a random grouping method, which comprises the following steps:

obtaining a sample to be grouped; wherein the samples to be grouped comprise qualitative data and quantitative data;

sequentially adding the samples to be grouped into processing groups, obtaining a first comprehensive index value corresponding to the processing groups added with the samples to be grouped based on the qualitative data and the number of the processing groups of each processing group, and obtaining a second comprehensive index value corresponding to the processing groups added with the samples to be grouped based on the quantitative data and the number of the processing groups of each processing group;

and calculating to obtain an interclass distribution difference value corresponding to the processing group added with the sample to be grouped based on the first comprehensive index value and the second comprehensive index value, determining the processing group added with the sample to be grouped when the interclass distribution difference value is minimum, and dividing the sample to be grouped into the processing group.

Preferably, the obtaining a first comprehensive index value corresponding to the processing group added with the sample to be grouped based on the qualitative data and the number of the processing groups of each processing group, and obtaining a second comprehensive index value corresponding to the processing group added with the sample to be grouped based on the quantitative data and the number of the processing groups of each processing group, includes:

based on the qualitative data and the number of the processing groups of each processing group, calculating by using a first mathematical model to obtain a first comprehensive index value corresponding to the processing group added with the sample to be grouped;

and calculating to obtain a second comprehensive index value corresponding to the processing group added with the sample to be grouped by utilizing a second mathematical model based on the quantitative data of each processing group and the number of the processing groups.

Preferably, the first mathematical model is:

wherein G represents the number of processing groups, and G represents the G-th processing group; l represents the total number of categories corresponding to the same qualitative data, and L represents the ith category; f. ofaAnd feRespectively representing the observation frequency and the expected frequency on the qualitative factors of the samples to be grouped in the same qualitative data.

Preferably, the second mathematical model is:

wherein G represents the number of processing groups, G represents the G-th processing group,andrespectively, the average value of any two treatment groups taken from all the treatment groups on the quantitative factors of the samples to be grouped,andrespectively representing the variance of any two treatment groups taken from all treatment groups on the quantitative factor of the samples to be grouped, n1And n2Respectively, the sample amounts taken from any two treatment groups out of all treatment groups.

Preferably, the calculating of the difference between the group distributions based on the first composite index value and the second composite index value includes:

and calculating to obtain the interclass distribution difference value by using a third mathematical model based on the first comprehensive index value and the second comprehensive index value.

Preferably, the third mathematical model is:

wherein M represents the number of hierarchical elements, M represents the mth hierarchical element, and wmWeight of the mth hierarchical factor, d when the attribute of the mth hierarchical factor is qualitative datamWhen the attribute of the mth hierarchical factor is quantitative data, dm=dn。

Preferably, before the calculating of the interclass distribution difference value based on the first overall index value and the second overall index value, the method further includes:

and normalizing the first comprehensive index value and/or the second comprehensive index value.

The present embodiment further provides a random grouping apparatus, including:

the acquisition module is used for acquiring samples to be grouped; wherein the samples to be grouped comprise qualitative data and quantitative data;

the calculation module is used for sequentially adding the samples to be grouped into the processing groups, obtaining a first comprehensive index value corresponding to the processing groups added with the samples to be grouped based on the qualitative data and the number of the processing groups of each processing group, and obtaining a second comprehensive index value corresponding to the processing groups added with the samples to be grouped based on the quantitative data and the number of the processing groups of each processing group;

and the dividing module is used for calculating to obtain an interclass distribution difference value corresponding to the processing group added with the sample to be grouped based on the first comprehensive index value and the second comprehensive index value, determining the processing group added with the sample to be grouped when the interclass distribution difference value is minimum, and dividing the sample to be grouped into the processing group.

The present embodiment also provides a computer device, including: the random grouping method comprises a memory and a processor, wherein the memory and the processor are connected with each other in a communication mode, computer instructions are stored in the memory, and the processor executes the computer instructions so as to execute the random grouping method.

The present embodiment also provides a computer-readable storage medium storing computer instructions for causing a computer to execute the random grouping method described above.

The technical scheme of the invention has the following advantages:

1. according to the random grouping method provided by the invention, the obtained samples to be grouped comprise qualitative data and quantitative data, a first comprehensive index value corresponding to the processing group added with the samples to be grouped is obtained by utilizing the qualitative data and the number of the processing groups of each processing group, and a second comprehensive index value corresponding to the processing group added with the samples to be grouped is obtained by utilizing the quantitative data and the number of the processing groups of each processing group.

The first comprehensive index value is obtained according to the qualitative data of all the processing groups, and the influence of the qualitative data in the samples to be grouped on all the processing groups is comprehensively considered; the second comprehensive index value is obtained directly according to the quantitative data, and the numerical variable does not need to be converted into the classification variable, so that the information contained in the numerical value is saved, and the influence of the quantitative data in the samples to be grouped on all processing groups is comprehensively considered.

And then, obtaining an intergroup distribution difference value of the corresponding processing group according to the first comprehensive index value and the second comprehensive index value, wherein the intergroup distribution difference value can reflect the balance of each processing group more accurately, so that the processing group selected and added to the sample to be grouped is more accurate, and the accuracy of the experiment is ensured.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a random grouping method according to embodiment 1 of the present invention;

fig. 2 is a block diagram of a random grouping apparatus according to embodiment 2 of the present invention;

fig. 3 is a schematic structural diagram of a computer device according to embodiment 3 of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Clinical trials to ensure the authenticity and accuracy of the experiment, experimental data are usually grouped using stochastic principles. The dynamic random algorithm adopted at present mainly comprises a traditional minimization method, but the traditional minimization method cannot directly process numerical variables, the numerical variables need to be converted into classified variables, and the converted classified variables are used for calculation.

The information loss of the numerical value can be caused by converting the numerical value variable into the classification variable, and when a plurality of layering factors are numerical value variables, the information lost by the numerical value variable can be more, so that the accuracy of the experimental result is seriously influenced. Moreover, when the traditional minimization method is used for processing the classification variables, only the problem of local equilibrium among different processing groups is considered, so that the result may be inaccurate when the classification variables are calculated. Therefore, in the case of simultaneously dealing with classification variables and numerical variables, the traditional minimization method cannot well realize grouping samples, and finally influences the accuracy of experimental results.

Example 1

Fig. 1 is a flowchart illustrating that samples to be grouped are sequentially added to processing groups, a first comprehensive index value is obtained based on qualitative data of each processing group and the number of the processing groups, a second comprehensive index value is obtained based on quantitative data of each processing group and the number of the processing groups, and the samples to be grouped are classified into the processing groups based on the first comprehensive index value and the second comprehensive index value according to some embodiments of the present invention. Although the processes described below include operations that occur in a particular order, it should be clearly understood that the processes may include more or fewer operations that are performed sequentially or in parallel (e.g., using parallel processors or a multi-threaded environment).

The embodiment provides a random grouping method, which is used for grouping sample data to obtain a grouping method with the minimum value of difference between groups. Wherein the sample data may be clinical data, such as: patient gender, age, tumor diameter, and blood pressure, among others. As shown in fig. 1, the method comprises the following steps:

s101, obtaining a sample to be grouped.

In the above implementation steps, the samples to be grouped include qualitative data and quantitative data. Wherein, the qualitative data belongs to the classification variable, and the quantitative data belongs to the numerical variable.

For example, the data of the samples to be grouped includes: patient a with a tumor diameter of 20.1 and gender female, patient b with a tumor diameter of 19.9 and gender female, and patient c with a tumor diameter of 20 and gender male, the qualitative data includes the gender of patient a: sex of woman, patient b: sex of woman, patient c: male; the quantitative data included the tumor diameter of patient a: 20.1, tumor diameter of patient b: 19.9, tumor diameter of patient c: 20.

s102, adding the samples to be grouped into processing groups in sequence, obtaining a first comprehensive index value corresponding to the processing group added with the samples to be grouped based on the qualitative data and the number of the processing groups of each processing group, and obtaining a second comprehensive index value corresponding to the processing group added with the samples to be grouped based on the quantitative data and the number of the processing groups of each processing group.

In the above implementation steps, the number of the processing groups includes a plurality, and a plurality of grouped data may exist in each processing group, and the data contained in each processing group is the same as the data of the samples to be grouped in a hierarchical factor. For example, if the sample to be grouped is a patient a with a tumor diameter of 20.1 and a female sex, each treatment group includes data of the tumor diameter and the sex, and the treatment groups can be shown in table 1 below.

TABLE 1

Sequentially adding samples to be grouped into processing groups, and obtaining a first comprehensive index value corresponding to the processing groups added with the samples to be grouped based on the qualitative data and the number of the processing groups of each processing group, wherein the first comprehensive index value represents the influence of the qualitative data (namely classification variables) on the difference between the groups; and obtaining a second comprehensive index value corresponding to the processing groups added with the samples to be grouped based on the quantitative data of each processing group and the number of the processing groups, wherein the second comprehensive index value represents the influence of the quantitative data (namely numerical variables) on the difference between the groups.

The first comprehensive index value considers the balance of all qualitative data among all processing groups, namely the influence of the added qualitative data on the whole situation. The second comprehensive index value is directly calculated by adopting quantitative data, and the numerical variable does not need to be converted into a classification variable, so that the information contained in the numerical variable cannot be lost.

For example, as shown in Table 1, the treatment groups include 2 groups, i.e., A group and B group, and a patient a having a tumor diameter of 20.1 and a female sex of the specimen to be grouped can be sequentially added to the A group and the B group. If the sample to be grouped is added into the group A, calculating by utilizing the sexes (male or female) of the group A and the group B to obtain a first comprehensive index value of the group A, and calculating by utilizing the tumor diameters of the group A and the group B to obtain a second comprehensive index value of the group A; and (3) calculating to obtain a first comprehensive index value of the group B by utilizing the sexes (male or female) of the group A and the group B and calculating to obtain a second comprehensive index value of the group B by utilizing the tumor diameters of the group A and the group B on the assumption that the samples to be grouped are added into the group B.

When the first comprehensive index value is calculated, the calculation can be carried out by utilizing the first mathematical model; in calculating the second composite merit value, the calculation may be performed using a second mathematical model. That is, the qualitative data and the number of processing groups for each processing group are substituted into the first mathematical model to obtain a first global index value, and the quantitative data and the number of processing groups for each processing group are substituted into the second mathematical model to obtain a second global index value.

S103, calculating to obtain an interclass distribution difference value corresponding to the processing group added with the sample to be grouped based on the first comprehensive index value and the second comprehensive index value, and determining the processing group added with the sample to be grouped when the interclass distribution difference value is minimum.

In the implementation step, an intergroup distribution difference value corresponding to the processing group to which the to-be-grouped sample is added is obtained according to the first comprehensive index value and the second comprehensive index value, and the intergroup distribution difference value combines the influence of qualitative data and quantitative data on the difference between the groups, so that the influence of the to-be-grouped sample (including the qualitative data and the quantitative data) on the balance between the groups can be more accurately reflected.

And after obtaining the interclass distribution difference values corresponding to all the processing groups, determining the processing group to which the to-be-grouped sample is added when the interclass distribution difference values are minimum. The larger the difference value of the distribution among the groups is, the larger the difference of the two groups becomes due to the corresponding grouping mode; the smaller the value of the difference between the groups is, the smaller the difference between the groups becomes.

For example, as shown in table 1 above, assume that the samples to be grouped: if a patient a with the tumor diameter of 20.1 and the gender of women is added into the group A, calculating to obtain a first comprehensive index value of the group A and a second comprehensive index value of the group A; assume that the samples to be grouped: if a patient a with a tumor diameter of 20.1 and a female sex is added into the group B, a first comprehensive index value of the group B and a second comprehensive index value of the group B are calculated.

Calculating to obtain an interclass distribution difference value of the group A by using the first comprehensive index value of the group A and the second comprehensive index value of the group A; calculating to obtain an interclass distribution difference value of the group B by using the first comprehensive index value of the group B and the second comprehensive index value of the group B, comparing the interclass distribution difference value of the group A with the interclass distribution difference value of the group B, and determining that when the interclass distribution difference value is minimum, the samples to be grouped are: tumor diameter 20.1, gender female patient a was added to the treatment group. For example, if the difference between the groups of group a is the smallest, the samples to be grouped are: the treatment group to which patient a with a tumor diameter of 20.1 and female gender was added was group a.

And S104, dividing the samples to be grouped into the processing group.

In the above implementation step, the processing group is a processing group to which the sample to be grouped is added when the difference value between the distributions of the groups is minimum. The smaller the difference value of the distribution among the groups is, the smaller the difference of the two groups can be caused by adding the sample to be grouped, so that the experimental result is more accurate.

For example, as shown in table 1 above, assume that the samples to be grouped: adding a patient a with the tumor diameter of 20.1 and the female sex into the group A and the group B, determining that the interclass distribution difference value of the group A is the minimum, and then grouping the samples to be grouped: patients a with a tumor diameter of 20.1 and gender of women were added to group a.

In the above embodiment, the obtained samples to be grouped include qualitative data and quantitative data, a first comprehensive index value corresponding to the processing group to which the samples to be grouped are added is obtained by using the qualitative data and the number of the processing groups of each processing group, and a second comprehensive index value corresponding to the processing group to which the samples to be grouped is added is obtained by using the quantitative data and the number of the processing groups of each processing group.

The first comprehensive index value is obtained according to the qualitative data of all the processing groups, and the influence of the qualitative data in the samples to be grouped on all the processing groups is comprehensively considered; the second comprehensive index value is obtained directly according to the quantitative data, and the numerical variable does not need to be converted into the classification variable, so that the information contained in the numerical value is saved, and the influence of the quantitative data in the samples to be grouped on all processing groups is comprehensively considered.

And then, obtaining an intergroup distribution difference value of the corresponding processing group according to the first comprehensive index value and the second comprehensive index value, wherein the intergroup distribution difference value can reflect the balance of each processing group more accurately, so that the processing group selected and added to the sample to be grouped is more accurate, and the accuracy of the experiment is ensured.

In one or more embodiments, the first mathematical model is:

wherein G represents the number of processing groups, and G represents the G-th processing group; l represents the total number of categories corresponding to the same qualitative data, and L represents the ith category; f. ofaAnd feRespectively representing the observation frequency and the expected frequency on the qualitative factors of the samples to be grouped in the same qualitative data.

In one or more embodiments, the second mathematical model is:

wherein G represents the number of processing groups, G represents the G-th processing group, and G! Represents a factorial of G;andrespectively, the average value of any two treatment groups taken from all the treatment groups on the quantitative factors of the samples to be grouped,andrespectively representing the variance of any two treatment groups taken from all treatment groups on the quantitative factor of the samples to be grouped, n1And n2Respectively, the sample amounts taken from any two treatment groups out of all treatment groups.

For example, in a clinical study with two treatment groups, group a and group B, tumor diameter and gender as stratification factors, the current distribution of subjects in each group of the study is shown in table 2 below, where "1" in the gender column indicates male gender and "2" indicates female gender.

TABLE 2

In this example, as shown in table 2, "L" in the first mathematical model is gender: male and female. In some embodiments, the total number of classes corresponding to the same qualitative data may be greater than 2, for example lung cancer may be classified by pathology type: lung adenocarcinoma, lung squamous carcinoma and small cell lung carcinoma, the number of classes in lung cancer is 3.

A new subject was added to the study, with a tumor diameter of 20.1 and a female sex. And calculating the second comprehensive index value of the group A and the second comprehensive index value of the group B by adopting a second mathematical model, wherein the calculation process can be as follows:

1) assume that the subject is divided into group A

Group a mean value:

group B mean values:

group A square difference:

the B formula is poor:

since the number of processing groups in the example is 2, i.e. G-2, there is only one combination. The second comprehensive index value corresponding to the group A is as follows:

2) assume that the subject is divided into group B

Group a mean value:

group B mean values:

group A square difference:

the B formula is poor:

since the number of processing groups in the example is 2, i.e. G-2, there is only one combination. The second composite index value corresponding to the group B is:

when the second mathematical model is used to calculate the second comprehensive index value, if the number of the processing groups is three or more, the second mathematical model needs to consider not only the processing group to which the samples to be grouped are added and the processing group to which the samples to be grouped are not added, but also the processing group to which the samples to be grouped are not added and a different processing group to which the samples to be grouped are not added.

For example, the processing group includes group a, group B, group C, and group D, and the second mathematical model is used to calculate the second composite metric value to obtain: a second composite indicator value assumed to be put into group A, a second composite indicator value assumed to be put into group B, a second composite indicator value assumed to be put into group C, and a second composite indicator value assumed to be put into group D. When the second mathematical model is used for calculating the second comprehensive index value of the group A (added with the samples to be grouped), not only the group A and the group B, the group A and the group C, the group A and the group D need to be considered, but also the group B and the group C, the group B and the group D, and the group C and the group D need to be considered, namely, the second comprehensive index value supposed to be grouped into the group A is calculated by comprehensively considering the relationship among all the processing groups, so that the calculation result is more accurate.

And calculating the first comprehensive index value of the group A and the first comprehensive index value of the group B by adopting a first mathematical model, wherein the calculation process can be as follows:

1) it is assumed that the subjects are divided into group a, wherein:

fa(A, male) ═ 3

fa(A, female) ═ 6

fa(B, male) ═ 4

fa(B, female) ═ 3

The first comprehensive index value corresponding to the group A is as follows:

2) it is assumed that the subjects are divided into group B, where:

fa(A, male) ═ 3

fa(A, female) ═ 5

fa(B, male) ═ 4

fa(B, female) ═ 4

The first comprehensive index value corresponding to the group B is:

it should be noted that, when the first mathematical model calculates that the number of categories corresponding to the same qualitative data is greater than 2, the observation frequency and the expected frequency of each category in each group need to be calculated. For example, treatment groups included groups C and D, each of which included lung cancer pathology types: lung adenocarcinoma, lung squamous carcinoma, and small cell lung carcinoma. When a new subject e (lung squamous carcinoma) is added to the group C or the group D, and the first comprehensive index value of the group C and the first comprehensive index value of the group D are calculated, the observation frequency comprises: f. ofa(C, lung adenocarcinoma), fa(C, squamous cell lung carcinoma) and fa(C, small cell lung cancer), fa(D, lung adenocarcinoma), fa(D, squamous cell lung carcinoma) and fa(D, small cell lung cancer); the desired frequency includes: f. ofe(C, lung adenocarcinoma), fe(C, squamous cell lung carcinoma) and fe(C, small cell lung cancer), fe(D, lung adenocarcinoma), fe(D, squamous cell lung carcinoma) and fe(D, small cell lung cancer).

In one or more embodiments, the interclass distribution difference value may be calculated by using a third mathematical model, that is, the first composite index value and the second composite index value of the same processing group are substituted into the third mathematical model to obtain the interclass distribution difference value. Wherein the third mathematical model may be:

wherein M represents the number of hierarchical elements, M represents the mth hierarchical element, and wmIs shown asWeights of m hierarchical factors, d when the attribute of the mth hierarchical factor is qualitative datamWhen the attribute of the mth hierarchical factor is quantitative data, dm=dn。

For example, table 2 above includes two hierarchical factors of tumor diameter and gender, and for simple calculation, the weights of tumor diameter and gender are both set to 1, and the difference between the groups in group a distribution is: dA=dnA+dcAThe difference between the distributions of group B, 0.8664+0.9070, 1.7734, was: dB=dnB+dcB=0.9573+0.2540=1.2113。

It can be seen that the addition of a new subject to group a made the group more unbalanced, and should be divided into group B.

Using conventional minimization, a female patient with a tumor size of 20.1 was calculated to be in group A or group B, with only concern about tumor size ≧ 20 in group A and group B and frequency in the female stratum, the results of which are shown in Table 3 below:

grouping new subjects into group a by frequency statistics found that the difference between the two groups was greater and therefore should be grouped into group B. Although the method provided by the embodiment is the same as the calculation result of the traditional minimization method, it can be obviously seen that the method provided by the embodiment pays more attention to the whole experiment rather than the part.

By statistical tests, new subjects should be assigned to group a if only tumor size differences are considered (traditional minimization and the method provided in this example t-test p values are 0.4226 and 0.3857, respectively, with a difference of 0.0369); if only gender differences were considered, new subjects should be assigned to group B (traditional minimization and the method provided in this example chi-square test p values of 0.3409 and 0.6143, respectively, differ by 0.2734). Comprehensive analysis found that in the two different methods of fractionation, the difference between gender was greater than tumor size, so dividing the subjects into group B further compromised the overall imbalance of the experiment.

It should be noted that when processing data that includes 3 and more than 3 hierarchical factors, the third mathematics integrates the effect of all hierarchical factors on inter-group differences. For example: aiming at three hierarchical factors of lung cancer pathological types, sexes and ages, when the interclass distribution difference value of the group E is calculated, a first comprehensive index value of the lung cancer pathological types in the group E, a first comprehensive index value of the sexes in the group E and a second comprehensive index value of the ages in the group E need to be calculated firstly. In some embodiments, the difference between the distribution of the groups may be calculated by a product, or the third mathematical model may be adaptively adjusted, for example, an adjustment parameter is added. Meanwhile, the first mathematical model and the second mathematical model can also be adjusted adaptively, such as adding adjustment parameters and the like.

In one or more embodiments, before calculating the inter-group distribution difference value based on the first composite index value and the second composite index value, the method further includes: and normalizing the first comprehensive index value and/or the second comprehensive index value.

After the first composite indicator value is obtained, it can be normalized to between [0, 1], that is:

in the formula, dcminIs a minimum value representing all dc, dcmaxIs representative of the maximum of all dc.

After the second composite index value is obtained, it can be normalized to be between [0, 1], that is:

in the formula, dnminIs to represent the minimum of all dn, dnmaxIs to represent the maximum of all dn.

In the first and second synthetic index value calculation stages, the normalization is performed to put the qualitative result and the quantitative result on a comparable scale. If the researcher considers that the normalization process is unnecessary, the researcher can directly use the original value of the first comprehensive index value and/or the original value of the second comprehensive index value to perform the next calculation. That is, the correction of the scale can be performed directly using the weight (weight in the third mathematical model).

In this embodiment, both normalization and weighting can be used simultaneously, or one of them can be used, and the specific situation can be determined according to the research needs. The method embodiment can be used for randomly grouping not only in the field of clinical medicine, but also in other fields.

Example 2

The present embodiment provides a random grouping apparatus, configured to group sample data to obtain a grouping method with a minimum value of inter-group difference. As shown in fig. 2, includes:

an obtaining module 201, configured to obtain a sample to be grouped; wherein the samples to be grouped comprise qualitative data and quantitative data. For details, please refer to the related description of step S101 in embodiment 1, which is not repeated herein.

The calculating module 202 is configured to sequentially add the samples to be grouped into the processing groups, obtain a first comprehensive index value corresponding to the processing group to which the samples to be grouped are added based on the qualitative data and the number of the processing groups of each processing group, and obtain a second comprehensive index value corresponding to the processing group to which the samples to be grouped are added based on the quantitative data and the number of the processing groups of each processing group. For details, please refer to the related description of step S102 in embodiment 1, which is not repeated herein.

A dividing module 203, configured to calculate an inter-group distribution difference value corresponding to the processing group to which the sample to be grouped is added based on the first comprehensive index value and the second comprehensive index value, determine the processing group to which the sample to be grouped is added when the inter-group distribution difference value is minimum, and divide the sample to be grouped into the processing group. For details, refer to the related description of step S103 and step S104 in embodiment 1, and are not repeated here.

In the above embodiment, the to-be-grouped samples acquired by the acquisition module 201 include qualitative data and quantitative data, the calculation module 202 obtains a first comprehensive index value corresponding to the processing group to which the to-be-grouped sample is added by using the qualitative data and the number of the processing groups of each processing group, and obtains a second comprehensive index value corresponding to the processing group to which the to-be-grouped sample is added by using the quantitative data and the number of the processing groups of each processing group. The first comprehensive index value is obtained according to the qualitative data of all the processing groups, and the influence of the qualitative data in the samples to be grouped on all the processing groups is comprehensively considered; the second comprehensive index value is obtained directly according to the quantitative data, and the numerical variable does not need to be converted into the classification variable, so that the information contained in the numerical value is saved, and the influence of the quantitative data in the samples to be grouped on all processing groups is comprehensively considered. And then, an intergroup distribution difference value corresponding to the processing group is obtained according to the first comprehensive index value and the second comprehensive index value, and the intergroup distribution difference value can reflect the balance of each processing group more accurately, so that the processing group selected and added by the dividing module 203 is more accurate, and the accuracy of the experiment is ensured.

Example 3

The present embodiment provides a computer device, as shown in fig. 3, the device includes a processor 301 and a memory 302, where the processor 301 and the memory 302 may be connected by a bus or other means, and fig. 3 takes the example of connection by a bus as an example.

Processor 301 may be a Central Processing Unit (CPU). The Processor 301 may also be other general purpose processors, Digital Signal Processors (DSPs), Graphics Processing Units (GPUs), embedded Neural Network Processors (NPUs), or other dedicated deep learning coprocessors, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or any combination thereof.

The memory 302, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules (e.g., the obtaining module 201, the calculating module 202, and the dividing module 203 shown in fig. 2) corresponding to the random grouping method in the embodiment of the present invention. The processor 301 executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory 302, that is, implements the random grouping method in the above-described method embodiment 1.

The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 301, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 302 may optionally include memory located remotely from the processor 301, which may be connected to the processor 301 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 302 and, when executed by the processor 301, perform the random grouping method in the embodiment shown in fig. 1.

In this embodiment, the memory 302 stores program instructions or modules of a random grouping method, and when the processor 301 executes the program instructions or modules stored in the memory 302, the obtained samples to be grouped include qualitative data and quantitative data, a first comprehensive index value corresponding to the processing group to which the samples to be grouped are added is obtained by using the qualitative data and the number of the processing groups of each processing group, and a second comprehensive index value corresponding to the processing group to which the samples to be grouped is added is obtained by using the quantitative data and the number of the processing groups of each processing group.

The first comprehensive index value is obtained according to the qualitative data of all the processing groups, and the influence of the qualitative data in the samples to be grouped on all the processing groups is comprehensively considered; the second comprehensive index value is obtained directly according to the quantitative data, and the numerical variable does not need to be converted into the classification variable, so that the information contained in the numerical value is saved, and the influence of the quantitative data in the samples to be grouped on all processing groups is comprehensively considered.

And then, obtaining an intergroup distribution difference value of the corresponding processing group according to the first comprehensive index value and the second comprehensive index value, wherein the intergroup distribution difference value can reflect the balance of each processing group more accurately, so that the processing group selected and added to the sample to be grouped is more accurate, and the accuracy of the experiment is ensured.

Embodiments of the present invention further provide a non-transitory computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions may execute the random grouping method in any of the above method embodiments. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

18页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种电子知情同意方法和系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!