Strong convection weather duration forecasting method based on ensemble learning

文档序号:1534105 发布日期:2020-02-14 浏览:25次 中文

阅读说明:本技术 基于集成学习的强对流天气持续时间预报方法 (Strong convection weather duration forecasting method based on ensemble learning ) 是由 文立玉 罗飞 向元吉 于 2019-10-22 设计创作,主要内容包括:本发明公开了基于集成学习的强对流天气持续时间预报方法,包括以下步骤:S1,数据源选取:选择预报地区的地面气象站资料以及离预报地区最近的两个探空站资料;S2,数据预处理:剔除错误及缺测资料,根据计算出的相关强对流预报参数作为输入,选择每次强队流天气持续的时间作为输出,当天没有出现强对流天气则认为时间为0,对预报参数即输入作归一化处理;S3,机器学习算法选择:选用K最近邻算法、多项式回归算法、决策树算法、神经网络算法。本发明主要用强对流天气发生当天的各种气象要素来推测强对流天气可能会持续的时间,通过多机器学习算法比较策略,对目标任务进行训练测试,选出其中最优的学习算法,用作实际的预报任务中。(The invention discloses a strong convection weather duration forecasting method based on ensemble learning, which comprises the following steps: s1, selecting a data source: selecting ground meteorological station data of a forecast area and two sounding station data closest to the forecast area; s2, preprocessing data: rejecting errors and missing data, taking the calculated related strong convection forecast parameters as input, selecting the lasting time of the weather of each strong convection as output, considering the time as 0 if no strong convection weather occurs in the day, and performing normalization processing on the forecast parameters, namely the input; s3, selecting by a machine learning algorithm: and selecting a K nearest neighbor algorithm, a polynomial regression algorithm, a decision tree algorithm and a neural network algorithm. The method mainly uses various meteorological elements of the day when the strong convection weather occurs to conjecture the possible duration time of the strong convection weather, and trains and tests the target task through a multi-machine learning algorithm comparison strategy to select the optimal learning algorithm for being used in the actual forecasting task.)

1. The method for forecasting the duration of the strong convection weather based on ensemble learning is characterized by comprising the following steps of:

s1, selecting a data source: selecting ground meteorological station data of a forecast area and two sounding station data closest to the forecast area;

s2, preprocessing data: rejecting errors and missing data, taking the calculated related strong convection forecast parameters as input, selecting the lasting time of the weather of each strong convection as output, considering the time as 0 if no strong convection weather occurs in the day, and performing normalization processing on the forecast parameters, namely the input;

s3, selecting by a machine learning algorithm: selecting a K nearest neighbor algorithm, a polynomial regression algorithm, a decision tree algorithm and a neural network algorithm;

s4, integrated learning execution flow: taking the normalized meteorological feature data as input, training data by adopting a cross validation mode, dividing the data into 10 parts, taking 8 parts for training and validating each algorithm and taking 1 part for validating the accuracy of the model in the process of training the first 9 parts of data, and according to the rule, each algorithm begins to obtain scores of 9 models and 9 models, and the highest score is selected for testing the last part of data, and the test score is taken as the final score of the algorithm; training 9 models for each algorithm according to the model training method and the scoring standard, and selecting an optimal model according to the score of each model; the four algorithms respectively have an optimal model, the optimal models of the four algorithms are tested according to the last piece of test data divided at the beginning, two optimal algorithms are selected as a final algorithm according to test scores, prediction results of the two optimal algorithms are summed and then averaged by combining the two algorithms, and a prediction result of the ensemble learning is obtained.

2. The ensemble learning-based strong convection weather duration forecasting method according to claim 1, wherein the relevant strong convection forecasting parameters include integral of specific humidity of whole layer, A index, K index, modified K index, total index, modified total index, effective convection potential, convection condensation temperature, free-lift convection temperature, 0-3 km vertical wind vector difference, strong weather threat index, 700hPa specific humidity, 700hPa relative humidity, 850hPa specific humidity, 850hPa relative humidity, ground dew point temperature.

3. The ensemble learning-based strong convection weather duration forecasting method according to claim 1, wherein the normalization process employs mean variance normalization to normalize all data used as input into a distribution with a mean of 0 and a variance of 1.

4. The ensemble learning-based strong convection weather duration forecasting method according to claim 1, wherein the K-nearest neighbor algorithm is used in a case where the variation of meteorological elements is within a stable range, i.e., exhibits a continuous form.

5. The ensemble learning-based strong convection weather duration forecasting method according to claim 1, wherein the polynomial regression algorithm is used in a case where the weather forecast involves many elements and all elements and results hardly have a linear relationship.

6. The ensemble learning-based strong convection weather duration forecasting method according to claim 1, wherein the decision tree algorithm is used in a case where each meteorological element plays more or less roles in the generation of the strong convection weather process.

7. The ensemble learning-based strong convection weather duration forecasting method according to claim 1, wherein the neural network algorithm is used in case of a complex relationship for the strong convection weather duration and the weather elements of the current day.

8. The ensemble learning-based strong convection weather duration forecasting method according to claim 1, wherein the model scoring is performed by using

Figure FDA0002241958370000031

Technical Field

The invention relates to the technical field of weather forecast, in particular to a strong convection weather duration forecasting method based on ensemble learning.

Background

Weather forecast (survey) or weather forecast (survey) is the prediction of the state of the earth's atmosphere at a future location using modern scientific techniques. Since prehistoric humans have started to predict weather to schedule their work and lives accordingly (e.g., agricultural production, military operations, etc.). Today's weather forecasting is mainly based on collecting a large amount of data (air temperature, humidity, wind direction and speed, air pressure, etc.) and then using current knowledge of atmospheric processes (meteorology) to determine future air changes. Weather forecasts are always subject to errors due to the confusion of atmospheric processes and today's science does not ultimately have a thorough understanding of atmospheric processes.

In the conventional forecasting method, a forecaster is manufactured by a weather method, but the accuracy is not high. In addition, although some algorithms improve the accuracy of strong convection weather prediction from different angles, a single prediction algorithm is often adopted. In practical applications, the requirements for data processing are different in different time and space and different seasons due to different regions. The single algorithm forecasting model cannot reflect the dynamic change characteristics of data, so that the forecasting stability is generally poor.

Disclosure of Invention

Based on the technical problems in the background art, the invention provides a strong convection weather duration forecasting method based on ensemble learning.

The invention provides a strong convection weather duration forecasting method based on ensemble learning, which comprises the following steps:

s1, selecting a data source: selecting ground meteorological station data of a forecast area and two sounding station data closest to the forecast area;

s2, preprocessing data: rejecting errors and missing data, taking the calculated related strong convection forecast parameters as input, selecting the lasting time of the weather of each strong convection as output, considering the time as 0 if no strong convection weather occurs in the day, and performing normalization processing on the forecast parameters, namely the input;

s3, selecting by a machine learning algorithm: selecting a K nearest neighbor algorithm, a polynomial regression algorithm, a decision tree algorithm and a neural network algorithm;

s4, integrated learning execution flow: taking the normalized meteorological feature data as input, training data by adopting a cross validation mode, dividing the data into 10 parts, taking 8 parts for training and validating each algorithm and taking 1 part for validating the accuracy of the model in the process of training the first 9 parts of data, and according to the rule, each algorithm begins to obtain scores of 9 models and 9 models, and the highest score is selected for testing the last part of data, and the test score is taken as the final score of the algorithm; training 9 models for each algorithm according to the model training method and the scoring standard, and selecting an optimal model according to the score of each model; the four algorithms respectively have an optimal model, the optimal models of the four algorithms are tested according to the last piece of test data divided at the beginning, two optimal algorithms are selected as a final algorithm according to test scores, prediction results of the two optimal algorithms are summed and then averaged by combining the two algorithms, and a prediction result of the ensemble learning is obtained.

Preferably, the relevant strong convection forecast parameters include integral of specific humidity of whole layer, index a, index K, corrected index K, total index, corrected total index, effective convection potential, convection condensation temperature, free rise convection temperature, 0-3 km vertical wind vector difference, strong weather threat index, 700hPa specific humidity, 700hPa relative humidity, 850hPa specific humidity, 850hPa relative humidity, and ground dew point temperature.

Preferably, the normalization process uses mean variance normalization, normalizing all data used as input into a distribution with a mean of 0 and a variance of 1.

Preferably, the K-nearest neighbor algorithm is used in case the meteorological elements vary within a stable range, i.e. exhibit a continuous morphology.

Preferably, the polynomial regression algorithm is used in situations where the weather forecast involves many factors, all of which are difficult to linearly relate to.

Preferably, the decision tree algorithm is used in the generation of strongly convective weather processes where each meteorological element plays a more or less significant role.

Preferably, the neural network algorithm is used in the context of a complex relationship for duration of strong convective weather and meteorological elements of the day.

Figure BDA0002241958380000031

Preferably, the model score is calculated by using a formula of S-0, β is more than or equal to 2 α, S represents a sample score of the model, α represents a true value of a prediction sample, β represents a predicted value of the prediction sample, β is a number which is more than or equal to 0, if β is calculated to be less than 0, the model score is considered to be 0, and the score of a certain model is obtained by adopting a mode of S summation and averaging for data containing a plurality of samples.

The method mainly uses various meteorological elements of the day of occurrence of the strong convection weather to conjecture the possible duration of the strong convection weather, properly integrates the K nearest neighbor algorithm, the polynomial regression algorithm, the decision tree algorithm and the neural network algorithm, trains the model by adopting a cross validation mode, has better consideration to all training data, namely various meteorological elements, in the training process of the model compared with the traditional forecasting method, and can obtain a more accurate calculation result compared with a single algorithm model by combining the optimal model trained by the four algorithms.

Detailed Description

The present invention will be further illustrated with reference to the following specific examples.

The invention provides a strong convection weather duration forecasting method based on ensemble learning, which comprises the following steps:

s1, selecting a data source: selecting ground meteorological station data of a forecast area and two sounding station data closest to the forecast area;

s2, preprocessing data: rejecting errors and missing measurement data, taking the calculated related strong convection forecast parameters as input, selecting the lasting time of the weather of each strong convection as output (the unit is divided), considering the time as 0 if no strong convection weather occurs in the day, and performing normalization processing on the forecast parameters, namely the input;

s3, selecting by a machine learning algorithm: selecting a K nearest neighbor algorithm, a polynomial regression algorithm, a decision tree algorithm and a neural network algorithm;

s4, integrated learning execution flow: the normalized meteorological characteristic data is used as input, so that the influence of different dimensional data on the model can be avoided, and the duration time of the strong convection weather is used as output. Training data by adopting a cross validation mode, obtaining a better training model, dividing the data into 10 parts, using the first 9 parts for training and validating each algorithm, using the last part for testing and scoring the model, taking 8 parts for training the model each time and taking 1 part for validating the accuracy of the model in the process of using the first 9 parts for training, according to the rule, each algorithm starts to obtain 9 models and 9 models for scoring, selecting the data with the highest score for testing the last part, and using the testing score as the final score of the algorithm; according to the model training method and the scoring standard, 9 models are trained for each algorithm, and an optimal model is selected according to the score of each model; the four algorithms respectively have an optimal model, the optimal models of the four algorithms are tested according to the last piece of test data divided at the beginning, two optimal algorithms are selected as a final algorithm according to test scores, prediction results of the two optimal algorithms are summed and then averaged by combining the two algorithms, and a prediction result of the ensemble learning is obtained.

Because the four algorithm models adopt a cross validation mode to train the models, the advantages of each algorithm, the characteristic value of each sample data and the output value of the characteristic value are fully considered, various overfitting conditions possibly caused by single algorithm and random nonuniformity are eliminated, the overall generalization capability of the models is improved, and better prediction capability can be shown when a new untrained data set is trained. In the training process of the model, better consideration can be given to various meteorological elements with training data than the traditional forecasting method. Meanwhile, the final result obtained by combining the optimal model trained by the four algorithms according to the method can be more accurate than that obtained by combining the optimal model trained by the traditional single algorithm model.

Specifically, the related strong convection forecast parameters comprise integral of specific humidity of the whole layer, A index, K index, corrected K index, total index, corrected total index, effective convection potential energy, convection condensation temperature, free lifting convection temperature, vertical wind vector difference of 0-3 km, strong weather threat index, 700hPa specific humidity, 700hPa relative humidity, 850hPa specific humidity, 850hPa relative humidity and ground dew point temperature.

Specifically, the normalization process uses mean variance normalization to normalize the data used as input to a distribution with a mean of 0 and a variance of 1, which can improve the operating efficiency and accuracy of the machine learning algorithm.

Specifically, the K-nearest neighbor algorithm is used in a case where the fluctuation of the meteorological elements is within a stable range, that is, exhibits a continuous form. The label value assigned to a predicted point is calculated from the average of its nearest neighbor label values;

the polynomial regression algorithm is used in the case that weather forecast involves many elements, and some elements and results are difficult to form a linear relation. In machine learning, a common mode is to train a linear function on data, but this method assumes that data is in a linear relationship. In real life, the relationship among data is mostly in a nonlinear form, the data can be well fitted only by adding polynomial characteristics, and polynomial regression can just solve the problems;

decision tree algorithms are used in the generation of strongly convective weather processes where each meteorological element plays more or less a role. Decision trees are a non-parametric supervised learning method for classification and regression. The goal is to create a model that predicts the value of the target variable by learning decision rules inferred from the data features;

neural network algorithms are used in the context of complex relationships for strong convective weather durations and weather elements of the day. What is to be dealt with is a relatively small number of numerical predictions, which are predicted here using a lightweight neural network such as a multi-layered perceptron. The multilayer perceptron can deal with more complex non-linear problems and has stronger fitting capability to non-linear data.

Specifically, the model score is calculated by using a formula of S ═ 0, β ≧ 2 α, S represents a sample score of the model, α represents a true value of a prediction sample, β represents a predicted value of the prediction sample, β is a number greater than or equal to 0, and if β is calculated to be less than 0, the model score is considered to be 0.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

7页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种降雨量传感器

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!