Method for forecasting hail and short-time heavy rainfall based on GBDT + LR model

文档序号:1427966 发布日期:2020-03-17 浏览:4次 中文

阅读说明:本技术 基于gbdt+lr模型的冰雹和短时强降水预报方法 (Method for forecasting hail and short-time heavy rainfall based on GBDT + LR model ) 是由 路志英 汪永清 于 2019-11-29 设计创作,主要内容包括:本发明公开了一种冰雹和短时强降水预报方法,包括:获取某一地区往年每年3月至9月冰雹和短时强降水发生前三小时的地面气象观测站点数据及该地区上游的多个探空站点数据;通过SMOTE过采样算法对上述数据中数据量相对较小的冰雹过程数据进行扩充,得到过采样后的数据集;采用PCA方法对过采样后的数据集进行降维;将降维后的数据集中的样本划分为训练集和测试集;构建GBDT+LR模型,将GBDT模型的叶节点所提取的特征作为LR模型的输入特征,通过训练集和测试集的样本对GBDT+LR模型进行训练和测试;采集地区待预测时间点之前三小时的地面气象观测站点数据,获取该地区上游的多个探空站点数据,将数据代入训练好的GBDT+LR模型,判定预测时间点出现的是冰雹还是短时强降水。(The invention discloses a hail and short-time strong precipitation forecasting method, which comprises the following steps: acquiring ground meteorological observation station data of three hours before hail and short-time strong precipitation occur in a certain area every 3 to 9 months in the past year and a plurality of sounding station data of the upstream of the area; expanding hail process data with relatively small data volume in the data through a SMOTE oversampling algorithm to obtain an oversampled data set; reducing the dimension of the oversampled data set by adopting a PCA method; dividing the samples in the reduced-dimension data set into a training set and a testing set; constructing a GBDT + LR model, taking the features extracted by leaf nodes of the GBDT model as input features of the LR model, and training and testing the GBDT + LR model through samples of a training set and a testing set; collecting ground meteorological observation station data three hours before a time point to be predicted of a region, obtaining a plurality of sounding station data of the upstream of the region, substituting the data into a trained GBDT + LR model, and judging whether hail or short-time strong precipitation occurs at the time point to be predicted.)

1. A method for forecasting hail and short-time strong rainfall based on a GBDT + LR model comprises the following steps:

s1, raw data acquisition: acquiring ground meteorological observation station data of three hours before hail and short-time strong precipitation occur in 3-9 months every year in a certain area, and acquiring a plurality of sounding station data at the upstream of the area;

s2, expanding the hail process data with relatively small data volume in the data through a SMOTE oversampling algorithm to obtain an oversampled data set;

s3, adopting a PCA method to reduce the dimension of the oversampled data set;

s4, data set partitioning: dividing the samples in the reduced-dimension data set into a training set and a testing set;

s5, constructing a GBDT + LR model, taking the features extracted by leaf nodes of the GBDT model as input features of the LR model, and training and testing the GBDT + LR model through samples of the training set and the testing set;

s6, collecting ground meteorological observation station data three hours before the to-be-predicted time point of the region, and acquiring a plurality of sounding station data of the upstream of the region;

and S7, performing PCA dimensionality reduction on the data in the S6, inputting the data into a trained GBDT + LR model, and judging whether hail or short-time strong precipitation occurs at the predicted time point.

2. The method of hail and short term heavy precipitation forecasting according to claim 1, characterized by: the construction process of the GBDT + LR model is as follows:

(1) constructing a GBDT model as follows:

Figure FDA0002296555460000011

wherein β is the corresponding weight of each basic learner, α is the parameter of each basic learner, and the parameterFor M data (x)i,yi) The loss function of (a) is the minimum optimal solution P,

let the loss function L:

wherein l is the loss function of the basic learner for each iteration,

then:

Figure FDA0002296555460000014

for each sample xiA gradient descent direction can be obtained, namely:

Figure FDA0002296555460000021

optimizing equation (8) yields:

Figure FDA0002296555460000022

further obtain βn

And finally obtaining an iterative description of the GBDT algorithm model:

Fn(x)=Fn-1(x)+βnh(x;αn) (13)

(2) an LR two-classification model based on a Sigmoid function is connected behind the GBDT model in series, wherein the Sigmoid function is shown as the following formula:

Figure FDA0002296555460000024

where θ is the weight coefficient of the model, and x is the parameter of the leaf node extracted by the GBDT model.

3. The method of hail and short term heavy precipitation forecasting according to claim 2, characterized by: the maximum iteration number of the GBDT + LR model is 10, the learning rate is 0.02, and the maximum depth of the tree is set to be 4.

4. The method of hail and short term heavy precipitation forecasting according to claim 3, characterized by: in step S2, the expanding the hail process data by the SMOTE oversampling algorithm includes the following steps:

1) for each sample x of hail process dataiCalculating Euclidean distances from the sample to other samples in the hail process data;

2) setting sampling multiplying power according to the proportion of the samples, selecting a plurality of samples in the similar hail process data, and setting the selected adjacent points as

Figure FDA0002296555460000025

3) For each randomly selected neighbor point

Figure FDA0002296555460000026

Figure FDA0002296555460000027

and expanding the hail process data.

5. The method of hail and short term heavy precipitation forecasting according to claim 4, characterized by: in step S1, the ground meteorological observation site data includes ground level air pressure, sea level air pressure, temperature, dew point temperature, relative humidity, water vapor pressure, 2 minute average wind direction, 2 minute average wind speed, 10 minute average wind direction, and 10 minute average wind speed.

6. The method of hail and short term heavy precipitation forecasting according to claim 5, characterized by: in step S1, the data of the sounding site includes effective convective energy cap (J · kg)-1) Optimum convection effective potential energy BCAPE (J.kg)-1) Convection suppression energy CIN (J.kg)-1) K-index KI, Samson index SI, lift index LI, optimum lift index BLI, modified K-index MK, deep convection index DCI, modified deep convection index MDCI, micro downburst daily potential index MDPI, convection stability index IC, optimum convection stability index BIC, conditional stability index IL, conditional-convection stability index ICL, total index TT, amount of atmospheric water PW (cm), convective condensation height CCL (hpa), convective temperature TCON (DEG C), elevated condensation temperature TC (DEG C), elevated condensation height PC (hpa), free convective height C (LFhpa), equilibrium height PE (hpa), 0 ℃ layer height ZH (gpm), -30 ℃ layer height FH (gpm), strong threat weather index SWEAT, thunderstorm high wind index WINDEX, storm relative vorticity index SRH, energy vorticity index EHI, rough survey number BRN, storm intensity index SSI, SWISS thunderstorm index SWISS00 and SWISS thunderstorm index SWISS 12.

7. The method of hail and short term heavy precipitation forecasting according to claim 6, characterized by: the method is characterized in that the certain region is an Tianjin region, the plurality of sounding sites are a Beijing sounding meteorological station, a chenchenge sounding meteorological station, a nutlet sounding meteorological station, a Chifeng sounding meteorological station and a Zhangkou sounding meteorological station, and the ground meteorological observation site data is the ground meteorological observation site data of the Tianjin city weather station from 2006 to 2018 in 3 to 9 months per year and three hours before the occurrence of short-time strong precipitation, and 55 hail process data and 397 short-time strong precipitation process data from 2006 to 2018 are collected.

8. The method of hail and short term heavy precipitation forecasting according to claim 7, characterized by: expanding the hail process data to 385 by using a SMOTE oversampling algorithm; and reducing the dimensionality of the hail short-time strong precipitation data set from 195 dimensions to 30 dimensions by adopting a PCA method.

9. The method of hail and short term heavy precipitation forecasting according to claim 8, characterized by: and dividing the reduced data set into a training set and a testing set according to the ratio of 8: 2.

Technical Field

The invention relates to the field of weather forecast, in particular to a hail and short-time heavy precipitation forecasting method.

Background

In weather forecast, hail and short-time strong precipitation have the characteristics of short period of production and extinction, small range of affected area and severe weather change. They have a great influence on the industry, agriculture and the daily life of people.

The forecast of hail and short-term strong rainfall can use meteorological radar, but the information reflected by the meteorological radar is only live, and the detection space scale is small, so that the meteorological radar can not forecast in advance for a long time.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a method for forecasting hail and short-term strong rainfall based on a GBDT + LR model, which realizes accurate forecasting of the hail and the short-term strong rainfall by utilizing the relation of physical field data and provides powerful support for accurately forecasting the strong convection weather.

Therefore, the invention adopts the following technical scheme:

a method for forecasting hail and short-time strong rainfall based on a GBDT + LR model comprises the following steps:

s1, raw data acquisition: acquiring ground meteorological observation station data of three hours before hail and short-time strong precipitation occur in 3-9 months every year in a certain area, and acquiring a plurality of sounding station data at the upstream of the area;

s2, expanding the hail process data with relatively small data volume in the data through a SMOTE oversampling algorithm to obtain an oversampled data set;

s3, adopting a PCA method to reduce the dimension of the oversampled data set;

s4, data set partitioning: dividing the samples in the reduced-dimension data set into a training set and a testing set;

s5, constructing a GBDT + LR model, taking the features extracted by leaf nodes of the GBDT model as input features of the LR model, and training and testing the GBDT + LR model through samples of the training set and the testing set;

s6, collecting ground meteorological observation station data three hours before the to-be-predicted time point of the region, and acquiring a plurality of sounding station data of the upstream of the region;

and S7, performing PCA dimensionality reduction on the data in the S6, inputting the data into a trained GBDT + LR model, and judging whether hail or short-time strong precipitation occurs at the predicted time point.

The GBDT + LR model is constructed by the following process:

(1) constructing a GBDT model as follows:

Figure BDA0002296555470000021

wherein β is the corresponding weight of each basic learner, α is the parameter of each basic learner, and the parameterFor M data (x)i,yi) The loss function of (a) is the minimum optimal solution P,

let the loss function L:

Figure BDA0002296555470000023

wherein l is the loss function of the basic learner for each iteration,

then:

Figure BDA0002296555470000024

Figure BDA0002296555470000025

for each sample xiA gradient descent direction can be obtained, namely:

Figure BDA0002296555470000026

optimizing equation (8) yields:

further obtain βn

Figure BDA0002296555470000028

And finally obtaining an iterative description of the GBDT algorithm model:

Fn(x)=Fn-1(x)+βnh(x;αn) (13)

(2) an LR two-classification model based on a Sigmoid function is connected behind the GBDT model in series, wherein the Sigmoid function is shown as the following formula:

Figure BDA0002296555470000029

where θ is the weight coefficient of the model, and x is the parameter of the leaf node extracted by the GBDT model.

By adjusting parameters of the GBDT model and the LR model, when the maximum iteration number is 10, the learning rate is 0.02, and the maximum depth of the tree is set to be 4, the obtained hail short-time strong rainfall forecast model is optimal.

In step S2, the expanding the hail process data by the SMOTE oversampling algorithm includes the following steps:

1) for each sample x of hail process dataiCalculating Euclidean distances from the sample to other samples in the hail process data;

2) setting sampling multiplying power according to the proportion of the samples, selecting a plurality of samples in the similar hail process data, and setting the selected adjacent points as

Figure BDA0002296555470000031

3) For each randomly selected neighbor point

Figure BDA0002296555470000032

New sample points were constructed according to equation (1):

and expanding the hail process data.

In step S1, the ground meteorological observation site data includes ground level air pressure, sea level air pressure, temperature, dew point temperature, relative humidity, water vapor pressure, 2 minute average wind direction, 2 minute average wind speed, 10 minute average wind direction, and 10 minute average wind speed.

In step S1, the data of the sounding site includes effective convective energy cap (J · kg)-1) Optimum convection effective potential energy BCAPE (J.kg)-1) Convection suppression energy CIN (J.kg)-1) K-index KI, Samson index SI, lift index LI, optimum lift index BLI, modified K-index MK, deep convection index DCI, modified deep convection index MDCI, micro downburst daily potential index MDPI, convection stability index IC, optimum convection stability index BIC, conditional stability index IL, conditional-convection stability index ICL, total index TT, amount of atmospheric water PW (cm), convective condensation height CCL (hpa), convective temperature TCON (DEG C), elevated condensation temperature TC (DEG C), elevated condensation height PC (hpa), free convective height C (LFhpa), equilibrium height PE (hpa), 0 ℃ layer height ZH (gpm), -30 ℃ layer height FH (gpm), strong threat weather index SWEAT, thunderstorm high wind index WINDEX, storm relative vorticity index SRH, energy vorticity index EHI, rough survey number BRN, storm intensity index SSI, SWISS thunderstorm index SWISS00 and SWISS thunderstorm index SWISS 12.

In an embodiment of the invention, the area is an Tianjin area, the plurality of sounding sites are Beijing sounding meteorological stations, a Schchen platform sounding meteorological station, a nutlet sounding meteorological station, a Chifeng sounding meteorological station and a Zhangkou sounding meteorological station, and the ground meteorological observation site data is the ground meteorological observation site data of the Tianjin city meteorological office from 2006 to 2018 years and from 3 to 9 months of hail and three hours before the occurrence of short-time strong precipitation, and 55 hail process data and 397 short-time strong precipitation process data are collected from 2006 to 2018 years. Expanding the hail process data to 385 by using a SMOTE oversampling algorithm; and reducing the dimensionality of the hail short-time strong precipitation data set from 195 dimensionality to 30 dimensionality by adopting a PCA (principal component analysis) method, and dividing the data set subjected to dimensionality reduction into a training set and a test set according to the proportion of 8: 2.

The invention has the following beneficial effects:

the method for forecasting the hail and the short-time strong rainfall based on the GBDT + LR model utilizes the ground physical field data of a meteorological observation station and the meteorological data of an upstream sounding station of the observation station, trains and fits the data through the GBDT + LR model, and obtains the correlation between the data and the hail and the short-time strong rainfall. The method uses physical field data recorded hourly, so that the advance of forecasting can be increased, and forecasting can be carried out 1 hour or several hours in advance.

2. The GBDT + LR model has excellent performance, the hit rate of hail is 0.902, and the critical success index is 0.859; the hit rate of short-term heavy precipitation is 0.946, the critical success index is 0.855, the forecast can be accurately carried out, and the influence of hail and short-term heavy precipitation weather on the society is reduced.

Drawings

Fig. 1 is a distribution diagram of sounding sites employed in embodiment 1 of the present invention;

FIG. 2 is a schematic diagram of the SMOTE algorithm;

FIG. 3 is a GBDT + LR model diagram in an embodiment of the invention;

FIG. 4 is a graph of the number of iterations of the tree versus the critical success index in an embodiment of the present invention;

FIG. 5 is a graph of learning rate versus threshold success index for an embodiment of the present invention;

FIG. 6 is a graph of maximum depth of the tree versus critical success index in an embodiment of the present invention.

Detailed Description

The method of the present invention is described in detail below with reference to the accompanying drawings and examples.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种多层光学薄膜、高亮度颜料及其制备方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!