Method and device for determining model characteristic binning scheme

文档序号:1963838 发布日期:2021-12-14 浏览:14次 中文

阅读说明:本技术 一种确定模型特征分箱方案的方法及装置 (Method and device for determining model characteristic binning scheme ) 是由 郭琰琰 陆凌 于 2021-11-02 设计创作,主要内容包括:本发明提供了一种确定模型特征分箱方案的方法及装置,该方法为:根据用户从特征分箱操作界面中输入的建模数据集名称,获取相应的建模数据集;从建模数据集中读取特征变量名称;从特征变量名称中确定目标变量名称;响应于用户在特征分箱操作界面中的配置指令,设置分箱参数及对应的分箱方案;根据分箱参数对目标特征变量进行分箱处理,生成并展示分箱结果;根据分箱结果对目标特征变量进行筛选和排序,得到筛选排序结果;根据筛选排序结果,确定满足预设条件的分箱方案为最优分箱方案。不需要技术人员逐一测试不同组合的分箱方法和分箱个数的分箱效果,降低特征分箱所耗费的时间以及提高特征分箱的效率。(The invention provides a method and a device for determining a model characteristic binning scheme, wherein the method comprises the following steps: acquiring a corresponding modeling data set according to a modeling data set name input by a user from a characteristic box-dividing operation interface; reading a characteristic variable name from a modeling dataset; determining a target variable name from the characteristic variable names; responding to a configuration instruction of a user in a characteristic box-dividing operation interface, and setting box-dividing parameters and a corresponding box-dividing scheme; performing box separation processing on the target characteristic variable according to the box separation parameters to generate and display box separation results; screening and sorting the target characteristic variables according to the box separation result to obtain a screening and sorting result; and determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the screening sorting result. Technicians are not required to test the box separation methods of different combinations and the box separation effects of the box separation numbers one by one, the time consumed by characteristic box separation is reduced, and the efficiency of the characteristic box separation is improved.)

1. A method of determining a model feature binning scheme, the method comprising:

acquiring a corresponding modeling data set according to a modeling data set name input by a user from a characteristic box-dividing operation interface;

reading a feature variable name from the modeling dataset;

determining a target variable name from the characteristic variable names, wherein the target characteristic variable corresponding to the target variable name at least carries an appointed label;

responding to a configuration instruction of the user in the characteristic box-dividing operation interface, and setting box-dividing parameters and a corresponding box-dividing scheme;

performing box separation processing on the target characteristic variables according to the box separation parameters, and generating and displaying box separation results, wherein the box separation results at least comprise preset index values corresponding to each target characteristic variable;

screening and sorting the target characteristic variables according to the box separation result to obtain a screening and sorting result;

and determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the screening sorting result.

2. The method of claim 1, wherein reading feature variable names from the modeling dataset comprises:

and reading field names corresponding to a plurality of fields from the modeling data set, and determining the read field names as characteristic variable names.

3. The method of claim 1, wherein the designated tag is a good tag or a bad tag.

4. The method of claim 1, wherein the binning parameters include at least a number of bins and a binning method.

5. The method according to claim 1, wherein the binning the target feature variable according to the binning parameters to generate and display a binning result comprises:

and carrying out gridding combination on the target characteristic variables according to the box separation parameters to generate and display box separation results.

6. The method of claim 1, wherein the predetermined metric value comprises at least one of: deletion rate, information content IV value, KS value, and population stability index PSI value.

7. The method according to claim 1, wherein the screening and sorting the target feature variables according to the binning result to obtain a screening and sorting result comprises:

and eliminating the target characteristic variables of which the preset index values do not meet the preset screening conditions, and sorting the rest target characteristic variables according to the preset index values to obtain a screening sorting result.

8. The method according to claim 1, wherein after the target feature variables are filtered and sorted according to the binning result to obtain a filtering and sorting result, the method further comprises:

and if the binning scheme meeting the preset condition is not determined according to the screening sorting result, adjusting the binning parameters and the corresponding binning scheme in response to an adjustment instruction of the user, and returning to execute the step of performing binning processing on the target characteristic variable according to the binning parameters to generate and display the binning result.

9. The method according to claim 1, wherein after determining that the binning scheme satisfying a preset condition is an optimal binning scheme, further comprising:

and generating an assignment code statement of the box separation result corresponding to the optimal box separation scheme.

10. An apparatus for determining a model feature binning scheme, the apparatus comprising:

the acquisition unit is used for acquiring a corresponding modeling data set according to the modeling data set name input by a user from the characteristic box-dividing operation interface;

a reading unit for reading a characteristic variable name from the modeling data set;

a determining unit, configured to determine a target variable name from the feature variable names, where a target feature variable corresponding to the target variable name carries at least an assigned tag;

the setting unit is used for responding to a configuration instruction of the user in the characteristic box-dividing operation interface and setting box-dividing parameters and a corresponding box-dividing scheme;

the generating unit is used for carrying out box separation processing on the target characteristic variables according to the box separation parameters, generating and displaying box separation results, and the box separation results at least comprise preset index values corresponding to each target characteristic variable;

the processing unit is used for screening and sorting the target characteristic variables according to the box separation result to obtain a screening and sorting result; and determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the screening sorting result.

Technical Field

The invention relates to the technical field of data processing, in particular to a method and a device for determining a model characteristic binning scheme.

Background

In the development process of the data model, feature binning processing needs to be performed on feature variables of the data model.

The existing characteristic box separation processing mode is as follows: technicians designate the box separation methods and the box separation numbers and perform characteristic box separation, and when a plurality of box separation methods and the box separation numbers are designated, the technicians are required to test and compare the box separation effects of the box separation methods and the box separation numbers in different combinations. However, as the number of combinations of the binning method and the binning number increases, the workload for testing the binning effect of different combinations of the binning method and the binning number increases exponentially, and each binning effect needs to be compared manually one by one, which results in longer time consumed for feature binning and lower efficiency of feature binning.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for determining a model feature binning scheme, so as to solve the problems of long time consumption and low efficiency in the existing feature binning processing manner.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

the embodiment of the invention discloses a method for determining a model characteristic binning scheme in a first aspect, which comprises the following steps:

acquiring a corresponding modeling data set according to a modeling data set name input by a user from a characteristic box-dividing operation interface;

reading a feature variable name from the modeling dataset;

determining a target variable name from the characteristic variable names, wherein the target characteristic variable corresponding to the target variable name at least carries an appointed label;

responding to a configuration instruction of the user in the characteristic box-dividing operation interface, and setting box-dividing parameters and a corresponding box-dividing scheme;

performing box separation processing on the target characteristic variables according to the box separation parameters, and generating and displaying box separation results, wherein the box separation results at least comprise preset index values corresponding to each target characteristic variable;

screening and sorting the target characteristic variables according to the box separation result to obtain a screening and sorting result;

and determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the screening sorting result.

Preferably, the reading of the feature variable name from the modeling dataset includes:

and reading field names corresponding to a plurality of fields from the modeling data set, and determining the read field names as characteristic variable names.

Preferably, the designated tag is a good tag or a bad tag.

Preferably, the binning parameters at least include the number of bins and the binning method.

Preferably, the binning processing on the target characteristic variable according to the binning parameters to generate and display binning results includes:

and carrying out gridding combination on the target characteristic variables according to the box separation parameters to generate and display box separation results.

Preferably, the preset index value at least comprises: deletion rate, information content IV value, KS value, and population stability index PSI value.

Preferably, the screening and sorting the target characteristic variables according to the binning result to obtain a screening and sorting result includes:

and eliminating the target characteristic variables of which the preset index values do not meet the preset screening conditions, and sorting the rest target characteristic variables according to the preset index values to obtain a screening sorting result.

Preferably, after the target characteristic variables are screened and sorted according to the binning result to obtain a screening and sorting result, the method further includes:

and if the binning scheme meeting the preset condition is not determined according to the screening sorting result, adjusting the binning parameters and the corresponding binning scheme in response to an adjustment instruction of the user, and returning to execute the step of performing binning processing on the target characteristic variable according to the binning parameters to generate and display the binning result.

Preferably, after determining that the binning scheme meeting the preset condition is the optimal binning scheme, the method further includes:

and generating an assignment code statement of the box separation result corresponding to the optimal box separation scheme.

The second aspect of the embodiments of the present invention discloses an apparatus for determining a model feature binning scheme, where the apparatus includes:

the acquisition unit is used for acquiring a corresponding modeling data set according to the modeling data set name input by a user from the characteristic box-dividing operation interface;

a reading unit for reading a characteristic variable name from the modeling data set;

a determining unit, configured to determine a target variable name from the feature variable names, where a target feature variable corresponding to the target variable name carries at least an assigned tag;

the setting unit is used for responding to a configuration instruction of the user in the characteristic box-dividing operation interface and setting box-dividing parameters and a corresponding box-dividing scheme;

the generating unit is used for carrying out box separation processing on the target characteristic variables according to the box separation parameters, generating and displaying box separation results, and the box separation results at least comprise preset index values corresponding to each target characteristic variable;

the processing unit is used for screening and sorting the target characteristic variables according to the box separation result to obtain a screening and sorting result; and determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the screening sorting result.

Based on the method and the device for determining the model characteristic binning scheme provided by the embodiment of the invention, the method comprises the following steps: acquiring a corresponding modeling data set according to a modeling data set name input by a user from a characteristic box-dividing operation interface; reading a characteristic variable name from a modeling dataset; determining a target variable name from the characteristic variable names; responding to a configuration instruction of a user in a characteristic box-dividing operation interface, and setting box-dividing parameters and a corresponding box-dividing scheme; performing box separation processing on the target characteristic variable according to the box separation parameters to generate and display box separation results; screening and sorting the target characteristic variables according to the box separation result to obtain a screening and sorting result; and determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the screening sorting result. In the scheme, a user sets different box separation parameters and corresponding box separation schemes through a characteristic box separation operation interface. And performing box separation processing on the target characteristic variable by using the box separation parameters, and generating and displaying box separation results. And screening and sorting the target characteristic variables by using the box separation result to obtain a screening and sorting result. And determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the obtained screening sorting result. Technicians are not required to test the box separation methods of different combinations and the box separation effects of the box separation numbers one by one, the time consumed by characteristic box separation is reduced, and the efficiency of the characteristic box separation is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a method for determining a model feature binning scheme provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram of a binning scheme provided by an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating results obtained after screening target characteristic variables according to an embodiment of the present invention;

FIG. 4 is another flow chart of a method for determining a model feature binning scheme provided by an embodiment of the present invention;

FIG. 5 is a block diagram of an apparatus for determining a model feature binning scheme according to an embodiment of the present invention;

fig. 6 is another structural block diagram of an apparatus for determining a model feature binning scheme according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

As can be seen from the background art, the current way of performing feature binning processing mainly depends on technicians to test and compare binning methods of different combinations one by one and binning effects of the number of bins. However, due to the increase of the combination of the binning method and the number of bins, the workload for testing the binning effect is exponentially increased, and the binning effects need to be manually compared one by one, so that the time consumed by characteristic binning is longer and the efficiency of characteristic binning is lower.

Therefore, the embodiment of the invention provides a method and a device for determining a model characteristic binning scheme, and a user sets different binning parameters and corresponding binning schemes through a characteristic binning operation interface. And performing box separation processing on the target characteristic variable by using the box separation parameters, and generating and displaying box separation results. And screening and sorting the target characteristic variables by using the box separation result to obtain a screening and sorting result. And determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the obtained screening sorting result. Technicians are not required to test the box separation methods of different combinations and the box separation effects of the box separation numbers one by one, so that the time consumed by characteristic box separation is reduced, and the efficiency of the characteristic box separation is improved.

It should be noted that, in the development process of the data model, the number of candidate feature variables (including discrete and continuous variables) of one data model is large (for example, thousands), and in order to make the data model more stable and avoid overfitting the data model, it is necessary to screen out the feature variables that are effective and can be input into the data model from the candidate feature variables. Therefore, feature binning is required for candidate feature variables to screen out feature variables that are valid and can be input to the data model. The method and the device for determining the model characteristic binning scheme provided by the scheme can automatically determine the optimal binning scheme, thereby reducing the time consumed by characteristic binning and improving the efficiency of characteristic binning, and specific contents are detailed in the following embodiments.

Referring to fig. 1, a flowchart of a method for determining a model feature binning scheme according to an embodiment of the present invention is shown, where the method includes:

step S101: and acquiring a corresponding modeling data set according to the modeling data set name input by the user from the characteristic box-dividing operation interface.

In the specific implementation process of step S101, a visual feature binning operation interface is displayed, and a user may input a name of a modeling data set that needs to be subjected to feature binning calculation in the feature binning operation interface. And acquiring a corresponding modeling data set by using the modeling data set name input by the user from the characteristic box-dividing operation interface.

It should be noted that the modeling data set includes data subjected to data cleansing operations such as deduplication and padding of missing values.

Step S102: the feature variable names are read from the modeling dataset.

It should be noted that each field in the modeling data set is a feature variable, and in the process of implementing step S102, the field names corresponding to a plurality of fields are read from the modeling data set, and the read field names are determined to be the feature variable names.

Step S103: and determining the target variable name from the characteristic variable names.

It should be noted that the feature variables need to carry the designated tag to perform subsequent feature binning and related data processing procedures, that is, the target feature variables corresponding to the target variable name carry at least the designated tag.

In some embodiments, the designated tag is a good tag (i.e., a good tag) or a bad tag (i.e., a bad tag), such as: for a certain target characteristic variable, if the value of the designated tag carried by the target characteristic variable is 0, the designated tag carried by the target characteristic variable is a good tag; if the value of the designated tag carried by the target characteristic variable is 1, the designated tag carried by the target characteristic variable is a bad tag.

In the process of implementing step S103 specifically, in response to a setting instruction in the user characteristic binning operation interface, the set target variable name is obtained, that is, the variable name (i.e., the target variable name) that needs to be subjected to characteristic binning is selected from all the characteristic variable names.

Step S104: and responding to a configuration instruction of a user in the characteristic box-dividing operation interface, and setting box-dividing parameters and a corresponding box-dividing scheme.

In the process of implementing step S104 specifically, in response to a configuration instruction of a user in the feature binning operation interface, multiple sets of binning parameters and binning schemes corresponding thereto are set.

In some embodiments, the binning parameters include at least the number of bins and the binning method. That is, in response to a configuration instruction of a user in the feature binning operation interface, binning parameters such as the number of binning, a binning method (which can be selected more), and the like are set, and a binning scheme corresponding to the binning parameters is set.

It should be noted that the binning scheme is that the binning result outputs a data set, and the binning scheme needs to be stored and needs to specify a file name and a location corresponding to the binning scheme. A schematic diagram of the binning scheme is shown in fig. 2 (fig. 2 is only used for example), and in fig. 2, the target feature variable corresponding to the target feature variable name of "academic calendar" is divided into 3 bins (the number of binning is 3), and the target feature variable corresponding to the target feature variable name of "generation payroll annual income" is divided into 5 bins (the number of binning is 5); wherein, the good customers are good labels, the bad customers are bad labels, the evidence Weights (WOE) are parameters obtained by calculation in the box separation process, and the WOE is calculated according to the number of the good customers, the number of the bad customers, the total number of the good customers and the total number of the bad customers.

It should be noted that, when setting the binning parameters, different binning parameters may be set for each target characteristic variable, or the binning parameters may be set for each target characteristic variable collectively, where the manner of setting the binning parameters is not limited herein.

Step S105: and performing box separation processing on the target characteristic variable according to the box separation parameters, and generating and displaying box separation results.

It should be noted that the binning result at least includes a preset index value corresponding to each target feature variable.

In some embodiments, the preset index value comprises: deletion rate, Information Value (IV), KS (Kolmogorov-Smirnov) value, and Population Stability Index (PSI) value. It should be understood that the foregoing specific contents related to the preset index value are only for illustration, and the preset index value also includes other related coefficients, which is not an example herein.

In the process of specifically implementing step S105, in response to an execution instruction of the user on the feature binning operation interface, executing a preset binning program script, binning the target feature variable according to the set binning parameters, and generating and displaying a binning result. Specifically, a preset binning program script is executed, the target characteristic variables are subjected to gridding combination according to the set binning parameters to generate binning results, and the binning results are displayed on a characteristic binning operation interface.

It can be understood from the above description that, after a plurality of sets of binning parameters and their corresponding binning schemes are preset, and each time a set of binning parameters is used to perform binning processing on a target feature variable, a preset index value of each target feature variable needs to be calculated, so as to obtain a binning result of the binning processing.

That is, each target feature variable may have a plurality of binning results according to different binning parameters, for example: different binning schemes result in different IV values for a single target feature variable.

Step S106: and screening and sorting the target characteristic variables according to the box separation result to obtain a screening and sorting result.

It should be noted that preset filtering conditions (for example, set according to factors such as monotonicity performance and variable correlation) for filtering the target feature variables are preset, and preset sorting conditions (for example, sorting according to IV values and KS values) for sorting the target feature variables are preset.

In the process of implementing step S106 specifically, a program script for screening and sorting the target characteristic variables is executed, and the target characteristic variables are screened and sorted according to the binning result to obtain a screening and sorting result. Specifically, the target characteristic variables of which the preset index values do not meet the preset screening conditions are removed, and the remaining target characteristic variables are sorted according to the preset index values and the preset sorting conditions to obtain a screening sorting result.

That is to say, the preset screening condition sets a threshold of the preset index value, and the target feature variables with the preset index value reaching the threshold are retained, that is, the target feature variables with the preset index value reaching the threshold can enter the data model, and the target feature variables with the preset index value not reaching the threshold are rejected.

For example: assuming that the preset screening condition is that the IV value is greater than 0.1 and the deletion rate is less than 0.05, the result of screening the target characteristic variable according to the IV value and the deletion rate is shown in fig. 3, it should be noted that the variable name in fig. 3 is the target characteristic variable name, and fig. 3 is only used for illustration.

And after the target characteristic variables are screened and sorted, obtaining corresponding screening and sorting results, outputting the screening and sorting results, and displaying the screening and sorting results in a characteristic box-dividing operation interface.

It should be noted that the result of the screening and sorting at least includes: target feature variable name, KS value, description of screening rule, screening result and the like.

Step S107: and determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the screening sorting result.

It should be noted that preset conditions are set in advance according to the set threshold, expert experience, variable reselection, variable elimination, and the like. In the process of specifically implementing the step S107, according to the screening sorting result, the binning scheme meeting the preset condition is determined to be the optimal binning scheme.

In some specific embodiments, as can be seen from the above, the screening and sorting results are displayed in the characteristic binning operation interface, and a user can perform qualitative analysis on the screening and sorting results through a preset condition to determine that an optimal binning scheme is obtained, and at this time, in response to an instruction of the user on the characteristic binning operation interface, the binning scheme meeting the preset condition is determined to be the optimal binning scheme.

Preferably, the binning result corresponding to the optimal binning scheme is output, and the binning result corresponding to the optimal binning scheme is displayed in the characteristic binning operation interface.

Preferably, if the binning scheme meeting the preset condition is not determined according to the screening sorting result, the binning parameters and the corresponding binning scheme are adjusted in response to the adjustment instruction of the user, and the step S105 is executed again.

In a specific implementation, if the binning scheme meeting the preset condition is not determined to be obtained (that is, the optimal binning scheme is not determined to be obtained), an interface capable of modifying the binning parameters is reserved for the user, the binning parameters and the corresponding binning scheme are adjusted in response to an adjustment instruction of the user on the characteristic binning operation interface, and the step S105 is returned to be executed to continue the binning processing until the optimal binning scheme is determined to be obtained.

Preferably, after the optimal binning scheme is determined, an assignment code statement of the binning result corresponding to the optimal binning scheme is generated, and the binning result is reserved by the generated assignment code statement for subsequent data modeling.

In the embodiment of the invention, a user sets different box separation parameters and corresponding box separation schemes through the characteristic box separation operation interface. And performing box separation processing on the target characteristic variable by using the box separation parameters, and generating and displaying box separation results. And screening and sorting the target characteristic variables by using the box separation result to obtain a screening and sorting result. And determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the obtained screening sorting result. Technicians are not required to test the box separation methods of different combinations and the box separation effects of the box separation numbers one by one, the time consumed by characteristic box separation is reduced, and the efficiency of the characteristic box separation is improved.

To better explain the above-described embodiment of the invention, illustrated in fig. 1, illustrated by another flow chart of a method of determining a model feature binning scheme, illustrated in fig. 4, fig. 4 comprises the following steps:

step S401: and displaying a characteristic box-dividing operation interface.

Step S402: the modeling dataset name is obtained.

Step S403: the feature variable names are read from the modeling dataset.

Step S404: and determining the target variable name from the characteristic variable names.

Step S405: and setting a box separation parameter and a corresponding box separation scheme.

Step S406: the binning program script is executed.

Step S407: and outputting and displaying the box separation result.

Step S408: setting a screening condition and a sorting condition.

Step S409: and screening and sequencing the target characteristic variables according to the box separation result.

Step S410: and outputting and displaying the screening and sorting result.

Step S411: and determining an optimal box separation scheme.

Step S412: and outputting and displaying the box separation result corresponding to the optimal box separation scheme.

Step S413: and generating an assignment code statement of the box separation result corresponding to the optimal box separation scheme.

It should be noted that, the execution principle of steps S401 to S413 is referred to the content in fig. 1 of the above embodiment of the present invention, and is not described herein again.

Corresponding to the method for determining the model feature binning scheme provided in the embodiment of the present invention, referring to fig. 5, an embodiment of the present invention further provides a structural block diagram of an apparatus for determining a model feature binning scheme, where the apparatus includes: an acquisition unit 501, a reading unit 502, a determination unit 503, a setting unit 504, a generation unit 505, and a processing unit 506;

the obtaining unit 501 is configured to obtain a corresponding modeling data set according to a modeling data set name input by a user from the feature binning operation interface.

A reading unit 502 for reading a feature variable name from the modeling dataset.

In a specific implementation, the reading unit 502 is specifically configured to: and reading field names corresponding to a plurality of fields from the modeling data set, and determining the read field names as characteristic variable names.

The determining unit 503 is configured to determine a target variable name from the feature variable names, where a target feature variable corresponding to the target variable name at least carries an assigned tag.

In some embodiments, the designated tag is a good tag or a bad tag.

The setting unit 504 is configured to set a binning parameter and a corresponding binning scheme in response to a configuration instruction of a user in the feature binning operation interface.

In some embodiments, the binning parameters include at least the number of bins and the binning method.

The generating unit 505 is configured to perform binning processing on the target characteristic variables according to the binning parameters, and generate and display binning results, where the binning results at least include preset index values corresponding to each target characteristic variable.

In a specific implementation, the generating unit 505 is specifically configured to: and carrying out gridding combination on the target characteristic variables according to the box separation parameters, and generating and displaying box separation results.

In some embodiments, the preset index value comprises: deletion rate, information content IV value, KS value, and population stability index PSI value.

The processing unit 506 is configured to screen and sort the target characteristic variables according to the binning result to obtain a screening and sorting result; and determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the screening sorting result.

In a specific implementation, the processing unit 506 for obtaining the screening and sorting result is specifically configured to: and eliminating the target characteristic variables of which the preset index values do not meet the preset screening conditions, and sorting the rest target characteristic variables according to the preset index values to obtain a screening sorting result.

Preferably, the processing unit 506 is further configured to: if the binning scheme meeting the preset condition is not determined according to the screening sorting result, the binning parameters and the corresponding binning scheme are adjusted in response to the adjustment instruction of the user, and the generating unit 505 is executed.

In the embodiment of the invention, a user sets different box separation parameters and corresponding box separation schemes through the characteristic box separation operation interface. And performing box separation processing on the target characteristic variable by using the box separation parameters, and generating and displaying box separation results. And screening and sorting the target characteristic variables by using the box separation result to obtain a screening and sorting result. And determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the obtained screening sorting result. Technicians are not required to test the box separation methods of different combinations and the box separation effects of the box separation numbers one by one, the time consumed by characteristic box separation is reduced, and the efficiency of the characteristic box separation is improved.

Preferably, referring to fig. 6 in conjunction with fig. 5, another structural block diagram of an apparatus for determining a model feature binning scheme according to an embodiment of the present invention is shown, where the apparatus further includes:

and a statement generating unit 507, configured to generate an assignment code statement of the binning result corresponding to the optimal binning scheme.

In summary, embodiments of the present invention provide a method and an apparatus for determining a model feature binning scheme, where a user sets different binning parameters and their corresponding binning schemes through a feature binning operation interface. And performing box separation processing on the target characteristic variable by using the box separation parameters, and generating and displaying box separation results. And screening and sorting the target characteristic variables by using the box separation result to obtain a screening and sorting result. And determining the box separation scheme meeting the preset conditions as an optimal box separation scheme according to the obtained screening sorting result. Technicians are not required to test the box separation methods of different combinations and the box separation effects of the box separation numbers one by one, the time consumed by characteristic box separation is reduced, and the efficiency of the characteristic box separation is improved.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:数据处理方法、装置、计算机可读存储介质及计算机设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!