Information generation method and device, electronic equipment and computer readable medium

文档序号:35634 发布日期:2021-09-24 浏览:9次 中文

阅读说明:本技术 信息生成方法、装置、电子设备和计算机可读介质 (Information generation method and device, electronic equipment and computer readable medium ) 是由 于莹 庄晓天 于 2021-06-09 设计创作,主要内容包括:本公开的实施例公开了信息生成方法、装置、电子设备和计算机可读介质。该方法的一具体实施方式包括:根据候选目标域样本数据,确定候选源域样本数据集合;对该候选目标域样本数据和该候选源域样本数据集合进行数据清洗处理,以生成目标域样本数据和源域样本数据集合;确定该目标域样本数据对应的数据类型信息,其中,该数据类型信息用于表征该目标域样本数据的数据间断性和数据稀疏性;确定与该数据类型信息匹配的预测模型,其中,该预测模型根据该源域样本数据集合训练得到;基于该预测模型和该目标域样本数据,生成预测信息。该实施方式提高了生成的预测信息的准确度。(The embodiment of the disclosure discloses an information generation method, an information generation device, electronic equipment and a computer readable medium. One embodiment of the method comprises: determining a candidate source domain sample data set according to the candidate target domain sample data; performing data cleaning processing on the candidate target domain sample data and the candidate source domain sample data set to generate target domain sample data and a source domain sample data set; determining data type information corresponding to the target domain sample data, wherein the data type information is used for representing data discontinuity and data sparsity of the target domain sample data; determining a prediction model matched with the data type information, wherein the prediction model is obtained by training according to the source domain sample data set; and generating prediction information based on the prediction model and the target domain sample data. This embodiment improves the accuracy of the generated prediction information.)

1. An information generating method, comprising:

determining a candidate source domain sample data set according to the candidate target domain sample data;

performing data cleaning processing on the candidate target domain sample data and the candidate source domain sample data set to generate target domain sample data and a source domain sample data set;

determining data type information corresponding to the target domain sample data, wherein the data type information is used for representing data discontinuity and data sparsity of the target domain sample data;

determining a prediction model matched with the data type information, wherein the prediction model is obtained by training according to the source domain sample data set;

and generating prediction information based on the prediction model and the target domain sample data.

2. The method of claim 1, wherein said generating prediction information based on said prediction model and said target domain sample data comprises:

constructing a characteristic vector corresponding to the target domain sample data;

and generating the prediction information according to the target domain sample data, the feature vector and the prediction model.

3. The method according to claim 1, wherein the determining data type information corresponding to the target domain sample data comprises:

determining data groups of which the demand is continuously a target value in the target domain sample data to obtain at least one data group;

and determining a data discontinuity value included in the data type information according to the number of the data groups in the at least one data group and the interval length of the data groups in the at least one data group.

4. The method according to claim 3, wherein the determining data type information corresponding to the target domain sample data further comprises:

determining the data quantity with the demand as the target value in the target domain sample data;

and determining a data sparsity numerical value included by the data type information according to the data quantity and the total data amount in the target domain sample data.

5. The method of claim 4, wherein the determining a predictive model that matches the data type information comprises:

in response to determining that the data discontinuity value is greater than a first threshold and the data sparsity value is greater than a second threshold, determining a first target prediction model as the prediction model matched with the data type information, wherein the first target prediction model is composed of a target migration learning classification model and a target regression model.

6. The method of claim 4, wherein the determining a predictive model that matches the data type information comprises:

in response to determining that the data discontinuity value is greater than a first threshold and the data sparsity value is less than or equal to a second threshold, determining a second target prediction model as the data type-matched prediction model.

7. The method of claim 5, wherein the predictive model is trained by:

determining the source domain sample data set and/or the target domain sample data as a training sample set;

and training an initial prediction model according to the training sample set to generate the prediction model.

8. The method of claim 7, wherein training an initial predictive model to generate the predictive model from the set of training samples comprises:

in response to determining that the proportion of training samples in the training sample set is greater than a third threshold, performing positive and negative sample equalization processing on the training samples in the training sample set to generate an equalized training sample set;

and training the initial prediction model according to the training sample set after the equalization processing so as to generate the prediction model.

9. The method of claim 1, wherein said determining a set of candidate source domain sample data from candidate target domain sample data comprises:

and determining the candidate source domain sample data set through a time sequence similarity analysis algorithm according to the candidate target domain sample data.

10. The method of claim 1, wherein the method further comprises:

and controlling a transfer device according to the prediction information to replenish the goods corresponding to the target domain sample data.

11. An information generating apparatus comprising:

a first determining unit configured to determine a candidate source domain sample data set according to the candidate target domain sample data;

the data cleaning processing unit is configured to perform data cleaning processing on the candidate target domain sample data and the candidate source domain sample data set to generate target domain sample data and a source domain sample data set;

a second determining unit, configured to determine data type information corresponding to the target domain sample data, where the data type information is used to characterize data discontinuity and data sparsity of the target domain sample data;

a third determining unit configured to determine a prediction model matched with the data type information, wherein the prediction model is trained according to the source domain sample data set;

a generating unit configured to generate prediction information based on the prediction model and the target domain sample data.

12. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.

13. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1 to 10.

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to an information generation method, an information generation device, an electronic device, and a computer-readable medium.

Background

Demand forecasting refers to a technique for accurately estimating future development trends based on existing data. Among these, the problem of intermittent demand (demand not continuous over a period of time) is highly prevalent for certain articles when the prediction granularity is refined (e.g., prediction by day). Prediction of such needs has been a difficult point in the industry.

The existing discontinuity prediction method often has the following technical problems:

under the condition that data are extremely sparse (for example, the zero demand ratio reaches more than 50%), the prediction accuracy is low, when the predicted quantity is greater than the actual demand quantity, the overstock of goods in the warehouse is easily caused, when the predicted quantity is less than the actual demand quantity, the waste of a large amount of inventory resources is often caused, and the utilization rate of the inventory resources is low under the two conditions.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose information generation methods, apparatuses, electronic devices, and computer readable media to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide an information generating method, including: determining a candidate source domain sample data set according to the candidate target domain sample data; performing data cleaning processing on the candidate target domain sample data and the candidate source domain sample data set to generate target domain sample data and source domain sample data set; determining data type information corresponding to the target domain sample data, wherein the data type information is used for representing data discontinuity and data sparsity of the target domain sample data; determining a prediction model matched with the data type information, wherein the prediction model is obtained by training according to the source domain sample data set; and generating prediction information based on the prediction model and the target domain sample data.

Optionally, the generating prediction information based on the prediction model and the target domain sample data includes: constructing a characteristic vector corresponding to the target domain sample data; and generating the prediction information according to the target domain sample data, the feature vector and the prediction model.

Optionally, the determining the data type information corresponding to the target domain sample data includes: determining data groups with the demand continuously being a target value in the target domain sample data to obtain at least one data group; and determining a data discontinuity value included in the data type information according to the number of the data groups in the at least one data group and the interval length of the data groups in the at least one data group.

Optionally, the determining data type information corresponding to the target domain sample data further includes: determining the data quantity with the demand quantity being the target value in the target domain sample data; and determining a data sparsity numerical value included by the data type information according to the data quantity and the total data quantity in the target domain sample data.

Optionally, the determining a prediction model matching the data type information includes: and in response to the fact that the data discontinuity value is larger than a first threshold value and the data sparsity value is larger than a second threshold value, determining a first target prediction model as a prediction model matched with the data type information, wherein the first target prediction model is composed of a target migration learning classification model and a target regression model.

Optionally, determining a prediction model matching the data type information includes: and in response to determining that the data discontinuity value is greater than a first threshold value and the data sparsity value is less than or equal to a second threshold value, determining a second target prediction model as the prediction model matched with the data type.

Optionally, the prediction model is obtained by training through the following steps: determining the source domain sample data set and/or the target domain sample data as a training sample set; and training an initial prediction model according to the training sample set to generate the prediction model.

Optionally, the training an initial prediction model according to the training sample set to generate the prediction model includes: in response to the fact that the proportion of the training samples in the training sample set is larger than a third threshold value, carrying out positive and negative sample equalization processing on the training samples in the training sample set to generate an equalized training sample set; and training the initial prediction model according to the training sample set after the equalization processing so as to generate the prediction model.

Optionally, the determining a candidate source domain sample data set according to the candidate target domain sample data includes: and determining the candidate source domain sample data set through a time sequence similarity analysis algorithm according to the candidate target domain sample data.

Optionally, the method further includes: and controlling a transfer device according to the prediction information to replenish the goods corresponding to the target domain sample data.

In a second aspect, some embodiments of the present disclosure provide an information generating apparatus, the apparatus comprising: a first determining unit configured to determine a candidate source domain sample data set according to the candidate target domain sample data; the data cleaning processing unit is configured to perform data cleaning processing on the candidate target domain sample data and the candidate source domain sample data set to generate target domain sample data and source domain sample data set; a second determining unit, configured to determine data type information corresponding to the target domain sample data, where the data type information is used to characterize data discontinuity and data sparsity of the target domain sample data; a third determining unit, configured to determine a prediction model matching the data type information, where the prediction model is trained according to the source domain sample data set; and a determination unit configured to generate prediction information based on the prediction model and the target domain sample data.

Optionally, the generating unit is configured to: constructing a characteristic vector corresponding to the target domain sample data; and generating the prediction information according to the target domain sample data, the feature vector and the prediction model.

Optionally, the second determination unit is configured to: determining data groups with the demand continuously being a target value in the target domain sample data to obtain at least one data group; and determining a data discontinuity value included in the data type information according to the number of the data groups in the at least one data group and the interval length of the data groups in the at least one data group.

Optionally, the second determination unit is configured to: determining the data quantity with the demand quantity being the target value in the target domain sample data; and determining a data sparsity numerical value included by the data type information according to the data quantity and the total data quantity in the target domain sample data.

Optionally, the third determining unit is configured to: and in response to the fact that the data discontinuity value is larger than a first threshold value and the data sparsity value is larger than a second threshold value, determining a first target prediction model as a prediction model matched with the data type information, wherein the first target prediction model is composed of a target migration learning classification model and a target regression model.

Optionally, the third determining unit is configured to: and in response to determining that the data discontinuity value is greater than a first threshold value and the data sparsity value is less than or equal to a second threshold value, determining a second target prediction model as the prediction model matched with the data type.

Optionally, the prediction model is obtained by training through the following steps: determining the source domain sample data set and/or the target domain sample data as a training sample set; and training an initial prediction model according to the training sample set to generate the prediction model.

Optionally, the training an initial prediction model according to the training sample set to generate the prediction model includes: in response to the fact that the proportion of the training samples in the training sample set is larger than a third threshold value, carrying out positive and negative sample equalization processing on the training samples in the training sample set to generate an equalized training sample set; and training the initial prediction model according to the training sample set after the equalization processing so as to generate the prediction model.

Optionally, the first determining unit is configured to: and determining the candidate source domain sample data set through a time sequence similarity analysis algorithm according to the candidate target domain sample data.

Optionally, the apparatus further comprises: and controlling a transfer device according to the prediction information to replenish the goods corresponding to the target domain sample data.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following beneficial effects: by the information generation method of some embodiments of the present disclosure, waste of inventory resources is reduced, and utilization rate of inventory resources is improved. Specifically, the reason why the utilization rate of the inventory resources is not high is that: the existing discontinuous prediction method has low prediction accuracy under the condition of extremely sparse data. Based on this, some embodiments of the present disclosure first determine a set of candidate source domain sample data from the candidate target domain sample data. And determining a candidate source domain sample data set to provide training samples for the subsequent training of the prediction model. And secondly, performing data cleaning processing on the candidate target domain sample data and the candidate source domain sample data set to generate target domain sample data and source domain sample data sets. In practical situations, dirty data can affect the accuracy of model training and the accuracy of prediction results, so that the dirty data can be well removed through data cleaning. In addition, data type information corresponding to the target domain sample data is determined, wherein the data type information is used for representing data discontinuity and data sparsity of the target domain sample data. Then, a prediction model matching the data type information is determined. In practical situations, different data often have different data characteristics (for example, data sparsity is different and/or data discontinuity is different), and when a single prediction model is used to predict all types of data, the model structure of the prediction model at this time is often complex, and a large number of training samples are required to train the model. This requires a high amount of data and training time for training samples. Therefore, by determining that the data type information matches the prediction model, the complexity of the model structure and the requirement for training sample data volume can be reduced. And finally, generating prediction information based on the prediction model and the target domain sample data. By the method, the accuracy of prediction is greatly improved. Therefore, the waste of inventory resources is reduced, and the utilization rate of the inventory resources is improved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a schematic diagram of one application scenario of the information generation method of some embodiments of the present disclosure;

FIG. 2 is a flow diagram of some embodiments of an information generation method according to the present disclosure;

FIG. 3 is a schematic diagram of target domain sample data;

FIG. 4 is a flow diagram of further embodiments of an information generation method according to the present disclosure;

FIG. 5 is a schematic block diagram of some embodiments of an information generating apparatus according to the present disclosure;

FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a schematic diagram of an application scenario of an information generation method of some embodiments of the present disclosure.

In the application scenario of fig. 1, first, the computing device 101 may determine a candidate source domain sample data set 103 according to the candidate target domain sample data 102; secondly, the computing device 101 may perform data cleaning processing on the candidate target domain sample data 102 and the candidate source domain sample data set 103 to generate target domain sample data 104 and source domain sample data set 105; then, the computing device 101 may determine data type information 106 corresponding to the target domain sample data 104, where the data type information 106 is used to characterize data discontinuity and data sparsity of the target domain sample data 104; further, the computing device 101 may determine a prediction model 107 that matches the data type information 106, wherein the prediction model 107 is trained according to the source domain sample data set 105; finally, the computing device 101 may generate the prediction information 108 based on the prediction model 107 and the target domain sample data 104.

The computing device 101 may be hardware or software. When the computing device is hardware, it may be implemented as a distributed cluster composed of multiple servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of computing devices in FIG. 1 is merely illustrative. There may be any number of computing devices, as implementation needs dictate.

With continued reference to fig. 2, a flow 200 of some embodiments of an information generation method according to the present disclosure is shown. The information generation method comprises the following steps:

step 201, according to the candidate target domain sample data, determining a candidate source domain sample data set.

In some embodiments, an executing subject of the information generating method (e.g., the computing device 101 shown in fig. 1) determines a candidate source domain sample data set according to candidate target domain sample data, which may include the following steps:

the first step, vectorization processing is carried out on the candidate target domain sample data to generate vectorized candidate target domain sample data.

The candidate target domain sample data refers to sample data corresponding to the target domain. The execution subject may perform vectorization processing on the candidate target domain sample data through one-hot encoding.

As an example, the candidate target domain sample data may be sales data for a period of time corresponding to the article a:

date Amount of sales
2020-01-01 1670
2020-01-02 884
2020-01-03 1169
2020-01-04 1227
··· ···
2020-12-13 321

And secondly, performing vectorization processing on each source domain data in the source domain data set to generate vectorized source domain data and obtain a vectorized source domain data set.

The source domain data in the source domain data set may be data corresponding to a source domain. The execution body may perform vectorization processing on the source domain data through one-hot encoding to generate vectorized source domain data.

As an example, the source domain data may be sales data for item B over a period of time:

date Amount of sales
2020-01-01 249
2020-01-02 137
2020-01-03 160
2020-01-04 178
··· ···
2020-12-13 574

And thirdly, determining the similarity value of the candidate target domain sample data after the vectorization processing and each source domain data after the vectorization processing in the source domain data set through a similarity calculation method to obtain a similarity value set.

The similarity algorithm may be KNN (K-nearest neighbor) algorithm. The similarity algorithm may also be a K-means algorithm.

And fourthly, screening out the source domain data with the corresponding similarity values meeting the screening conditions from the source domain data set as candidate source domain sample data to obtain the candidate source domain sample data set.

The screening condition may be that the similarity value is equal to or greater than a target value. For example, the target value may be 0.9.

Step 202, performing data cleaning processing on the candidate target domain sample data and the candidate source domain sample data set to generate target domain sample data and source domain sample data set.

In some embodiments, the execution agent may perform data cleansing on the candidate target domain sample data and the candidate source domain sample data set to generate target domain sample data and source domain sample data set. The execution subject may remove the dirty data in the candidate target domain sample data and the candidate source domain sample data set, so as to implement data cleaning processing on the candidate target domain sample data and the candidate source domain sample data set. The dirty data may be data for which a default value exists. For example, "date: 2020-01-02, sales volume: the amount of sales in null is null, and thus, this piece of data may be dirty data. The dirty data may be data with data exception. For example, "2020-01-02, sales: the sales in-20 "are" -20 ", obviously not in accordance with the actual situation, and therefore this piece of data may be dirty data.

Step 203, determining data type information corresponding to the target domain sample data.

In some embodiments, the execution principal may determine the data type information corresponding to the target domain sample data by:

the first step is to determine the number of data with the demand as the target value in the target domain sample data.

The data type information may be used to characterize data discontinuity and data sparsity of the target domain sample data. The data type information may include: the target domain sample data includes a proportional value of data whose demand is a target value and a variance of the demand in the target domain sample data. The ratio of the data with the target value as the required amount in the target domain sample data may represent the data discontinuity of the target domain sample data. The variance of the demand in the target domain sample data may characterize data sparsity of the target domain sample data. For example, the demand in the target domain sample data may represent a sales volume corresponding to the target item.

As an example, the above target value may be 0. The target domain sample data may be as shown in fig. 3. Wherein, fig. 3 includes: a date field 301 and a demand field 302. The number of data whose demand is 0 in fig. 3 may be 6.

Secondly, determining a proportion value of data with the demand as a target value in the target domain sample data by the following formula:

wherein F represents the above ratio value. N represents the number of demanded quantities in the target domain sample data. M represents the number of data whose demand is a target value in the target domain sample data.

As an example, the above ratio value may be 0.6.

Thirdly, determining the variance of the demand in the target domain sample data through the following formula:

wherein S represents the standard deviation. N represents the number of demanded quantities in the target domain sample data. i represents a serial number. x represents the amount of demand in the target domain sample data. x is the number ofiAnd the ith demand in the target domain sample data is represented.And the average value of the demand in the target domain sample data is represented.

As an example, the variance of the demand in the target domain sample data described above may be 3.1599999999999997.

Step 204, a prediction model matched with the data type information is determined.

In some embodiments, the execution subject may determine the prediction model matching the data type information by:

in response to determining that the ratio is smaller than a first target value and the variance is smaller than a second target value, a first candidate prediction model is determined as the prediction model matched with the data type information.

Wherein the first target value and the second target value may be manually set. The first candidate prediction model may be, but is not limited to, any of the following: CNN (Convolutional Neural Networks) models and RNN (Recurrent Neural Networks) models.

As an example, the first target value may be 0.5. The second target value may be 3.

And a second step of determining a second candidate prediction model as the prediction model matched with the data type information in response to determining that the ratio value is smaller than the first target value and the variance is larger than or equal to a second target value.

The second candidate prediction model may be, but is not limited to, any of the following: ridge Regression (Ridge Regression) models and Stepwise Regression (Stepwise Regression) models.

The prediction model can be obtained by training according to the source domain sample data set. The executing agent may perform vectorization processing on the source domain sample data in the source domain sample data set, and then train the prediction model by using a corresponding vectorized source domain sample data set as a training sample set.

Step 205, generating the prediction information based on the prediction model and the target domain sample data.

In some embodiments, the execution agent may generate the prediction information based on the prediction model and the target domain sample data. The execution agent may input the target domain sample data after the vectorization processing as the prediction model, and output the prediction model as the prediction information. For example, the predictive information may be used to characterize the sales of the target item over a preset time period in the future.

As an example, the above prediction information may be the predicted demand of the article a in the future week, and may be as follows:

date Demand (sales volume)
2020-12-01 365.90
2020-12-02 429.95
2020-12-03 447.28
2020-12-04 412.65
2020-12-05 412.65
2020-12-06 418.03
2020-12-07 426.21

The above embodiments of the present disclosure have the following beneficial effects: by the information generation method of some embodiments of the present disclosure, waste of inventory resources is reduced, and utilization rate of inventory resources is improved. Specifically, the reason why the utilization rate of the inventory resources is not high is that: the existing discontinuous prediction method has low prediction accuracy under the condition of extremely sparse data. Based on this, some embodiments of the present disclosure first determine a set of candidate source domain sample data from the candidate target domain sample data. And determining a candidate source domain sample data set to provide training samples for the subsequent training of the prediction model. And secondly, performing data cleaning processing on the candidate target domain sample data and the candidate source domain sample data set to generate target domain sample data and source domain sample data sets. In practical situations, dirty data can affect the accuracy of model training and the accuracy of prediction results, so that the dirty data can be well removed through data cleaning. In addition, data type information corresponding to the target domain sample data is determined, wherein the data type information is used for representing data discontinuity and data sparsity of the target domain sample data. Then, a prediction model matching the data type information is determined. In practical situations, different data often have different data characteristics (for example, data sparsity is different and/or data discontinuity is different), and when a single prediction model is used to predict all types of data, the model structure of the prediction model at this time is often complex, and a large number of training samples are required to train the model. This requires a high amount of data and training time for training samples. Therefore, by determining that the data type information matches the prediction model, the complexity of the model structure and the requirement for training sample data volume can be reduced. And finally, generating prediction information based on the prediction model and the target domain sample data. By the method, the accuracy of prediction is greatly improved. Therefore, the waste of inventory resources is reduced, and the utilization rate of the inventory resources is improved.

With further reference to fig. 4, a flow 400 of further embodiments of an information generation method is illustrated. The process 400 of the information generating method includes the following steps:

step 401, according to the candidate target domain sample data, determining a candidate source domain sample data set through a time series similarity analysis algorithm.

In some embodiments, the determining, by the executing agent, the candidate source domain sample data set according to the candidate target domain sample data by using a time series similarity analysis algorithm may include:

the method comprises the steps of firstly, sorting the demand in the candidate target domain sample data according to dates to obtain a first demand sequence.

As an example, the candidate target domain sample data may be:

the first demand sequence may be [1670, 884, 1169, 1227, 345, 321 ].

And secondly, sequencing the demand in the source domain data according to dates for each source domain data in the source domain data set to obtain a second demand sequence.

As an example, the source domain data may be sales data for item C over a target time period. The second demand sequence may be [123, 123, 1232, 24, 465, 45, 57, 6, 5, 6, 575 ].

And thirdly, determining the similarity of the first demand quantity sequence and each second demand quantity sequence in the obtained at least one second demand quantity sequence according to the time sequence similarity analysis algorithm to obtain a similarity value.

The time series similarity analysis algorithm can be used for determining the similarity of demand quantity sequences with different lengths. In practical cases, demand data of different items often differ, for example, item D counts demand data once in 7 days. And counting the once demand data of the article E by taking 8 days as a unit. The demand quantity sequences corresponding to the two are different, and the demand quantity sequences corresponding to the article D and the article E do not need to be aligned through a time sequence similarity analysis algorithm. The data processing capacity is improved.

Step 402, performing data cleaning processing on the candidate target domain sample data and the candidate source domain sample data set to generate target domain sample data and source domain sample data set.

In some embodiments, the specific implementation of step 302 and the technical effect brought by the implementation may refer to step 202 in those embodiments corresponding to fig. 2, which are not described herein again.

Step 403, determining data groups with the demand continuously being the target value in the target domain sample data to obtain at least one data group.

In some embodiments, the execution subject may determine data groups of which the demand is continuously a target value in the target domain sample data, and obtain at least one data group.

As an example, the above target value may be 0. The target domain sample data may be as shown in fig. 3. Wherein, the at least one data group may be:

[[2020-01-02:0,2020-01-03:0],

[2020-01-05:0],

[2020-01-07∶0,2020-01-08:0,2020-01-09:0]]。

step 404, determining a data discontinuity value included in the data type information according to the number of the data groups in the at least one data group and the interval length of the data groups in the at least one data group.

In some embodiments, the execution body may determine the data discontinuity value included in the data type information according to the number of data groups in the at least one data group and the interval length of the data groups in the at least one data group, by using the following formula:

wherein A represents the data discontinuity value. i represents a serial number. L represents the number of data sets in the at least one data set. y represents a section length of the data group of the at least one data group. y isiAnd a section length indicating an i-th data group of the at least one data group.

As an example, the at least one data set may be:

[[2020-01-02:0,2020-01-03:0],

[2020-01-05:0],

[2020-01-07:0,2020-01-08:0,2020-01-09:0]]。

the data discontinuity value may be 2 (calculated as follows):

step 405, determining the data quantity with the demand as the target value in the target domain sample data.

In some embodiments, the execution principal may determine the data amount of the target domain sample data whose required amount is the target value.

As an example, the above target value may be 0. The target domain sample data may be as shown in fig. 3, where the data amount of the target domain sample data whose demand is the target value may be 6.

Step 406, determining a data sparsity value included in the data type information according to the data quantity and the total data amount in the target domain sample data.

In some embodiments, the execution subject may determine the data sparsity value included in the data type information according to the data amount and the total amount of data in the target domain sample data by using the following formula:

where P represents the data sparsity value. B represents the above data amount. D represents the total amount of the data.

As an example, the above data number may be 6. The total amount of data may be 10. The data sparsity value may be 0.6.

Step 407, in response to determining that the data discontinuity value is greater than the first threshold and the data sparsity value is greater than the second threshold, determining the first target prediction model as a prediction model matched with the data type information.

In some embodiments, the execution subject may determine the first target prediction model as the prediction model for which the data type information matches in response to determining that the data discontinuity value is greater than a first threshold and the data sparsity value is greater than a second threshold. Wherein, the first threshold value and the second threshold value can be manually set. For example, the first threshold may be 1.32. The second threshold may be 0.5. The first target prediction model is composed of a target transfer learning classification model and a target regression model. The target migration learning classification model may be, but is not limited to, any one of the following: CNN model, RNN model, RF model and Tradaboost classification model. The target regression model may be, but is not limited to, any of the following: linear Regression (Linear Regression) model, Ridge Regression (Ridge Regression) model, Stepwise Regression (Stepwise Regression) model, Polynomial Regression (polymodal Regression) model and elastic Regression (elastonet Regression) model.

Step 408, in response to determining that the data discontinuity value is greater than the first threshold and the data sparsity value is less than or equal to the second threshold, determining the second target prediction model as a data type-matched prediction model.

In some embodiments, the execution subject may determine the second target prediction model as the data type-matched prediction model in response to determining that the data discontinuity value is greater than a first threshold and the data sparsity value is equal to or less than a second threshold. The second target prediction model may be a Two Stage tradaboost.r2 model.

Alternatively, the prediction model may be trained by the following steps:

firstly, determining the source domain sample data set and/or the target domain sample data as a training sample set.

And determining the target domain sample data set as a training sample set in response to determining that the prediction model does not comprise the target transfer learning classification model. And in response to determining that the prediction model comprises a target migration learning classification model, determining the source domain sample data set and the target domain sample data as a training sample set.

For example, the target domain sample data may be actual sales data of the article in a period of time, which is specifically shown in the following table:

date Actual demand (actual sales volume)
2020-01-01 1670
2020-01-02 884
2020-01-03 1169
2020-01-04 1227
2020-01-05 345
2020-01-06 321

And secondly, training an initial prediction model according to the training sample set to generate the prediction model.

The execution subject may first perform encoding processing on each training sample in the training sample set to generate an encoded training sample, so as to obtain an encoded training sample set. Then, the initial prediction model is trained based on the encoded training sample set to generate the prediction model. The initial prediction model may be, but is not limited to, any of the following: CNN model, RNN model, linear regression model and Tradaboost classification model.

Optionally, the training of the initial prediction model by the executive body according to the training sample set to generate the prediction model may include the following steps:

in the first step, in response to the fact that the proportion of the training samples in the training sample set is larger than a third threshold value, positive and negative sample equalization processing is carried out on the training samples in the training sample set to generate an equalized training sample set. The training sample proportion can represent the ratio of the number of training samples with the target value as the demand to the training samples with the target value as the demand. The execution subject can reduce the number of training samples with the demand as the target value in the training sample set to realize the balance of positive and negative samples.

And secondly, training the initial prediction model according to the training sample set after the equalization processing so as to generate the prediction model.

The executing entity may use the training sample set after the equalization process as an input of the initial prediction model, and may adjust a weight of the initial prediction model according to a difference between an output of the initial prediction model and a label set corresponding to the training sample set after the equalization process, so as to train the initial prediction model.

And step 409, constructing a feature vector corresponding to the target domain sample data.

In some embodiments, the executing entity constructing the feature vector corresponding to the target domain sample data may include the following steps:

and step one, acquiring article information corresponding to the target domain sample data.

Wherein, the article information may include: a base time feature, an event feature, a time lag feature, a time aggregation feature, and a demand trend feature. The event characteristics can characterize whether the date corresponding to each piece of data in the target domain sample data is legal holiday and/or promotion date. The time lag characteristic may characterize a demand characteristic lagging a first target duration and a corresponding difference corresponding to each piece of data in the target domain sample data. For example, the first target time period may be 7 days. The time aggregation characteristic may characterize a mean, a minimum, a maximum, a quantile, and a skewness peak of demand within a second target duration of lag corresponding to each piece of data in the target domain sample data. For example, the second target time period may be 7 days. The demand trend feature may characterize the rate of change of demand over the previous N days for long lag time second target. For example, N may be 1.

And secondly, encoding the article information to generate the feature vector.

And step 410, generating prediction information according to the target domain sample data, the characteristic vector and the prediction model.

In some embodiments, the execution agent may use the target domain sample data and the feature vector as inputs of the prediction model, and use an output of the prediction model as the prediction information.

And 411, controlling the allocating device according to the prediction information, and replenishing goods for the article corresponding to the target domain sample data.

In some embodiments, the executing agent may control the allocating device according to the prediction information to replenish the object corresponding to the target domain sample data.

As an example, the above prediction information may be "date: 2020-4-20, and the estimated demand: 300". The execution body may control an information transmission device to transmit the prediction information to a terminal corresponding to a supplier as replenishment information.

As can be seen from fig. 4, compared with the description of some embodiments corresponding to fig. 2, the present disclosure firstly makes the obtained data sparsity value and data discontinuity value more precise for the data feature of the target domain sample data by determining the above formula of the data sparsity value and determining the above formula of the data discontinuity value. In addition, the first target prediction model is composed of a target migration learning classification model and a target regression model. By the target migration learning classification model, time consumption caused by model reconstruction can be avoided. In addition, the prediction result of the target domain sample data with higher zero-demand occupation ratio is more accurate.

With further reference to fig. 5, as an implementation of the methods illustrated in the above figures, the present disclosure provides some embodiments of an information generating apparatus, which correspond to those illustrated in fig. 2, and which may be particularly applied in various electronic devices.

As shown in fig. 5, the information generating apparatus 500 of some embodiments includes: a first determination unit 501, a data cleansing processing unit 502, a second determination unit 503, a third determination unit 504, and a generation unit 505. The first determining unit 501 is configured to determine a candidate source domain sample data set according to candidate target domain sample data; a data cleaning processing unit 502 configured to perform data cleaning processing on the candidate target domain sample data and the candidate source domain sample data set to generate a target domain sample data and a source domain sample data set; a second determining unit 503, configured to determine data type information corresponding to the target domain sample data, where the data type information is used to characterize data discontinuity and data sparsity of the target domain sample data; a third determining unit 504, configured to determine a prediction model matching the data type information, where the prediction model is trained according to the source domain sample data set; a generating unit 505 configured to generate prediction information based on the prediction model and the target domain sample data.

In some optional implementations of some embodiments, the generating unit 505 is configured to: constructing a characteristic vector corresponding to the target domain sample data; and generating the prediction information according to the target domain sample data, the feature vector and the prediction model.

In some optional implementations of some embodiments, the second determining unit 503 is configured to: determining data groups with the demand continuously being a target value in the target domain sample data to obtain at least one data group; and determining a data discontinuity value included in the data type information according to the number of the data groups in the at least one data group and the interval length of the data groups in the at least one data group.

In some optional implementations of some embodiments, the second determining unit 503 is configured to: determining the data quantity with the demand quantity being the target value in the target domain sample data; and determining a data sparsity numerical value included by the data type information according to the data quantity and the total data quantity in the target domain sample data.

In some optional implementations of some embodiments, the third determining unit 504 is configured to: and in response to the fact that the data discontinuity value is larger than a first threshold value and the data sparsity value is larger than a second threshold value, determining a first target prediction model as a prediction model matched with the data type information, wherein the first target prediction model is composed of a target migration learning classification model and a target regression model.

In some optional implementations of some embodiments, the third determining unit 504 is configured to: and in response to determining that the data discontinuity value is greater than a first threshold value and the data sparsity value is less than or equal to a second threshold value, determining a second target prediction model as the prediction model matched with the data type.

In some optional implementations of some embodiments, the prediction model is trained by: determining the source domain sample data set and/or the target domain sample data as a training sample set; and training an initial prediction model according to the training sample set to generate the prediction model.

In some optional implementations of some embodiments, the training an initial prediction model according to the training sample set to generate the prediction model includes: in response to the fact that the proportion of the training samples in the training sample set is larger than a third threshold value, carrying out positive and negative sample equalization processing on the training samples in the training sample set to generate an equalized training sample set; and training the initial prediction model according to the training sample set after the equalization processing so as to generate the prediction model.

In some optional implementations of some embodiments, the first determining unit 501 is configured to: and determining the candidate source domain sample data set through a time sequence similarity analysis algorithm according to the candidate target domain sample data.

In some optional implementations of some embodiments, the apparatus 500 further includes: and controlling a transfer device according to the prediction information to replenish the goods corresponding to the target domain sample data.

It will be understood that the elements described in the apparatus 500 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.

Referring now to FIG. 6, a block diagram of an electronic device (such as computing device 101 shown in FIG. 1)600 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining a candidate source domain sample data set according to the candidate target domain sample data; performing data cleaning processing on the candidate target domain sample data and the candidate source domain sample data set to generate target domain sample data and source domain sample data set; determining data type information corresponding to the target domain sample data, wherein the data type information is used for representing data discontinuity and data sparsity of the target domain sample data; determining a prediction model matched with the data type information, wherein the prediction model is obtained by training according to the source domain sample data set; and generating prediction information based on the prediction model and the target domain sample data.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first determination unit, a data cleansing processing unit, a second determination unit, a third determination unit, and a generation unit. Here, the names of these units do not in some cases constitute a limitation on the unit itself, and for example, the generation unit may be further described as "a unit that generates prediction information corresponding to the target domain sample data based on the prediction model.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

24页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种应急预案流程化分解执行方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!