Method for predicting cabinet inlet air temperature based on reinforcement learning model

文档序号：1085538 发布日期：2020-10-20 浏览：12次中文

阅读说明：本技术 一种基于强化学习模型的预测机柜进风温度的方法 (Method for predicting cabinet inlet air temperature based on reinforcement learning model ) 是由周兴东郑贤清张士蒙任群于 2020-06-24 设计创作，主要内容包括：本发明涉及人工智能技术领域,具体为一种基于强化学习模型的预测机柜进风温度的方法,所述方法包括以下步骤：所述方法包括以下步骤,步骤1通过热成像装置采集机柜的表面实际温度数据,通过热敏装置采集对应机柜的实际进风温度数据；步骤2调用神经网络模型进行训练,用机柜的表面实际温度数据为输入,机柜的实际进风温度数据作为输出反复训练,使得神经网络模型通过训练后能预测出机柜的进风模拟温度数据；步骤3建立强化学习模型；步骤4获得增强学习模型最优策略下的神经网络模型生成新的预测器；步骤5用最优的预测器对机柜的进风温度进行预测。该方法提高机柜的进风模拟温度数据准确率,节约了物料与人工成本,便于使用。(The invention relates to the technical field of artificial intelligence, in particular to a method for predicting the inlet air temperature of a cabinet based on a reinforcement learning model, which comprises the following steps: the method comprises the following steps that 1, the actual temperature data of the surface of the cabinet is collected through a thermal imaging device, and the actual inlet air temperature data of the corresponding cabinet is collected through a thermosensitive device; step 2, calling a neural network model for training, using the actual surface temperature data of the cabinet as input, and using the actual inlet air temperature data of the cabinet as output for repeated training, so that the neural network model can predict inlet air simulation temperature data of the cabinet after training; step 3, establishing a reinforcement learning model; step 4, obtaining a neural network model under the optimal strategy of the reinforcement learning model to generate a new predictor; and 5, predicting the inlet air temperature of the cabinet by using an optimal predictor. The method improves the accuracy of the inlet air simulation temperature data of the cabinet, saves the material and labor cost and is convenient to use.)

1. A method for predicting the inlet air temperature of a cabinet based on a reinforcement learning model comprises the following steps:

step 1: acquiring actual surface temperature data of the cabinet through a thermal imaging device, and acquiring actual inlet air temperature data of the corresponding cabinet through a thermosensitive device;

step 2: calling a neural network model for training, taking the actual surface temperature data of the cabinet as input, and taking the actual inlet air temperature data of the cabinet as output for repeated training, so that the inlet air simulation temperature data of the cabinet can be predicted by the neural network model after training;

and step 3: establishing a reinforcement learning model;

the neural network model is used as an Agent of the reinforcement learning model;

each predicted Action of the neural network is Action;

simulating MSE of the inlet air temperature and the actual inlet air temperature as Environment;

the magnitude of MSE of the simulated air inlet temperature and the actual air inlet temperature is used as the basis for setting the Reward;

and 4, step 4: obtaining a neural network model under the optimal strategy of the reinforcement learning model to generate a new predictor;

and 5: and predicting the inlet air temperature of the cabinet by using an optimal predictor.

2. The method for predicting the inlet air temperature of the cabinet based on the reinforcement learning model as claimed in claim 1, wherein: in step 1, the thermal imaging device is an infrared thermal imager.

3. The method for predicting the inlet air temperature of the cabinet based on the reinforcement learning model as claimed in claim 1, wherein: in step 1, the thermosensitive device is a thermosensitive sensor.

4. The method for predicting the inlet air temperature of the cabinet based on the reinforcement learning model as claimed in claim 1, wherein: in step 4, the following rules are followed in the process of setting the reinforcement learning model Reward:

when the MSE of the simulated inlet air temperature and the actual inlet air temperature is within a temperature interval [2, infinity ], the evaluation index is-100;

when MSE is in the temperature interval [1, 2), the evaluation index is-10;

when MSE is in the temperature interval [0.5, 1), the evaluation index is-1;

MSE in the temperature interval [0, 0.5), evaluation index + 100.

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method for predicting the inlet air temperature of a cabinet based on a reinforcement learning model.

Background

The measurement of the inlet air temperature of the cabinet of the machine room data center is a topic which can not be avoided by designing the machine room, the temperature field of the machine room needs to be measured before the machine room is modified in the energy-saving modification process of the machine room, the prior method adopts a handheld thermosensitive temperature measuring device to sequentially measure the inlet air flow temperature of each cabinet according to a certain sequence, the whole process consumes excessive time and cost, and the temperature field of the data center is changed constantly, if the state of an air conditioner is changed in the measurement process, the measured temperature field is not the result which people want under the condition, the infrared thermal imaging method is used for measurement, the infrared thermal imaging method has the advantages that the temperature information of the whole machine room can be collected in a short time, only the temperature data of the surface of the cabinet is measured, the temperature data of the inlet air flow cannot be represented, and the data center is a complex time-varying environment, the problem of predicting the inlet air temperature of the cabinet cannot be well solved by the conventional BP neural network-based method. In view of this, a method for predicting the inlet air temperature of the cabinet based on a reinforcement learning model is provided.

Disclosure of Invention

The invention aims to provide a method for predicting the inlet air temperature of a cabinet based on a reinforcement learning model, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme:

a method for predicting the inlet air temperature of a cabinet based on a reinforcement learning model comprises the following steps:

and step 3: establishing a reinforcement learning model;

the neural network model is used as an Agent of the reinforcement learning model;

each predicted Action of the neural network is Action;

simulating MSE of the inlet air temperature and the actual inlet air temperature as Environment;

and the magnitude of the MSE of the simulated air inlet temperature and the actual air inlet temperature is used as the basis for setting the Reward.

And 4, step 4: obtaining a neural network model under the optimal strategy of the reinforcement learning model to generate a new predictor;

and 5: and predicting the inlet air temperature of the cabinet by using an optimal predictor.

Preferably, in step 1, the thermal imaging device is an infrared thermal imager.

Preferably, in step 1, the thermosensitive device is a thermosensitive sensor.

Preferably, in step 4, the following rules are followed in the process of setting the reinforcement learning model Reward:

when the MSE of the simulated inlet air temperature and the actual inlet air temperature is within a temperature interval [2, infinity ], the evaluation index is-100;

when MSE is in the temperature interval [1, 2), the evaluation index is-10;

when MSE is in the temperature interval [0.5, 1), the evaluation index is-1;

MSE in the temperature interval [0, 0.5), evaluation index + 100.

Compared with the prior art, the invention has the beneficial effects that: according to the method for predicting the air inlet temperature of the cabinet based on the reinforcement learning model, the actual surface temperature data and the actual air inlet temperature data of the cabinet are obtained, the surface temperature and the air inlet temperature of the cabinet can be detected in real time, the input and the output of the neural network model can be updated in time according to the detected temperature data, the accuracy of the predicted data of the neural network model is improved through continuous training and learning, the accuracy of the air inlet simulation temperature data of the cabinet can be improved, the time for early exploration of a machine room data center is shortened, the number of thermosensitive probes arranged in the later transformation process can be reduced, the field construction workload is reduced, the material and labor cost is saved, a new, quick and effective method is provided for later periodic inspection, and the method is convenient to use.

Drawings

FIG. 1 is a flow chart of the overall steps of the present invention;

FIG. 2 is a line graph showing the variation of the actual inlet air temperature data and the simulated inlet air temperature data of the cabinet at the same time;

fig. 3 is a line graph showing error value variation at the same time for the actual inlet air temperature data and the simulated inlet air temperature data of the cabinet of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-3, a technical solution provided by the present invention is:

a method for predicting the inlet air temperature of a cabinet based on a reinforcement learning model comprises the following steps:

and step 3: establishing a reinforcement learning model;

the neural network model is used as an Agent of the reinforcement learning model;

each predicted Action of the neural network is Action;

simulating the MSE (absolute value error) of the inlet air temperature and the actual inlet air temperature as Environment;

and the magnitude of the MSE of the simulated air inlet temperature and the actual air inlet temperature is used as the basis for setting the Reward.

And 4, step 4: obtaining a neural network model under the optimal strategy of the reinforcement learning model to generate a new predictor;

and 5: and predicting the inlet air temperature of the cabinet by using an optimal predictor.

In this embodiment, in step 1, the thermal imaging device is an infrared thermal imager, and the thermosensitive device is a thermosensitive sensor.

Further, in step 4, the following rules are followed in the process of setting the reinforcement learning model Reward:

when the MSE (mean square error) of the simulated inlet air temperature and the actual inlet air temperature is within a temperature interval [2, + ∞ ], the evaluation index is-100;

when MSE is in the temperature interval [1, 2), the evaluation index is-10;

when MSE is in the temperature interval [0.5, 1), the evaluation index is-1;

MSE in the temperature interval [0, 0.5), evaluation index + 100.

It is worth noting that the BP algorithm is a learning algorithm with instructor, which contains hidden nodes. For an input sample, an output is derived through forward reasoning over the network and then compared to a desired output sample. If there is a deviation, it propagates back from the output, adjusting the weighting coefficients Wji.

Let X be the input sample, Y be the output sample, T be the expected output sample, η be the learning rate (a positive number less than 1), f (X) be the action function of the network, choose the sigmoid curve, but Wji the weight coefficient of the i-th cell to the j-th cell connection, f' (X) is the derivative of f (X), the forward propagation is from input to output layer by layer, the output of the previous layer as the input of the next layer then:

forward propagation:

wherein the content of the first and second substances,

f(x)＝1/(1-exp(-x))

the learning process comprises the following steps:

W_ji(n+1)＝W_ji(n)+η·_j·x_i

for the output node:

for non-output nodes:

the slow convergence of the BP algorithm is caused by the fact that errors are complex nonlinear functions of time, while the BP algorithm is a simple steepest descent method in nature, and weight adjustment is based on partial derivatives of the errors on weights. I.e., in the direction of least rate of change of error, and f (x) as convergence approaches, resulting in slow convergence. The initial value is a small random number and the weight increments:

ΔW′_ji＝η·_j·x′_i

the coefficients in this equation are modified to different degrees but the values remain the same, resulting in over-modification of some of the coefficients, so that convergence occurs only when η is small.

Specifically, the preferred algorithm used in the present invention is the Q-Learning algorithm, and other algorithms are not listed.

Q-Learning is a value-based algorithm in a reinforcement Learning algorithm, wherein Q is Q (S, a), namely in the S State (S belongs to S) at a certain moment, the expectation that the profit can be obtained by taking the Action a (a belongs to A) is taken, and the environment can feed back corresponding rewarded according to the Action of agent, so the main idea of the algorithm is to construct a Q-table by State and Action to store a Q value, and then the Action capable of obtaining the maximum profit is selected according to the Q value.

Q-Table	a1	a2	a3	…	an
						s1	q(s1,a1)	q(s1,a2)	q(s1,a3)	…	q(s1,an)
s2	q(s2,a1)	q(s2,a2)	q(s2,a3)	…	q(s2,an)
						s3	q(s3,a1)	q(s3,a2)	q(s3,a3)	…	q(s3,an)
…	…	…	…	…	…
						sn	q(sn,a1)	q(sn,a2)	q(sn,a3)	…	q(sn,an)

Agents, environment state (environment), reward (reward), and action (action) can abstract the problem into a Markov decision process, and we count each grid as a state St, and pi (alpha | s) takes an action alpha strategy in the s state.

P (s '| s, α) is the probability of selecting an α action in the s state to transition to the next state s'. R (s '| s, α) represents reward for taking an α action to transition to s' in the s state, with the goal of finding a policy that achieves the maximum reward.

Solving optimal decision sequence, state value function of Markov decision process by bellman equation

V_π(s) the current state can be evaluated, and the value of each state is not only determined by the current state, but also related to the following states, so that the accumulated reward of the states can be expected to obtain the state value function V(s) of the current state.

Optimal cumulative expected available V^*(s) represents:

optimal value action function:

Q^*(s,a)＝max_πQ*(s,a)

the deployment is as follows:

Q^*(s,a)＝∑_s'P(s'∣s,a)(R(s,a,s')+γ·max_a'Q^*(s',a'))

the Bellman equation is actually a transfer of the cost action function:

q-learning update formula:

Q(s,a)＝Q(s,a)+α[(R+γ·max_a'Q(s',a')-Q(s,a))²]

the largest value of Q (s ', alpha ') is selected from the next state s ' to be multiplied by the decay gamma plus the true return value as Q reality, and Q (s ', alpha ') in the past Q table is used as Q estimation.

When the method for predicting the inlet air temperature of the cabinet based on the reinforcement learning model is used, the actual surface temperature data of the cabinet is acquired through the thermal imaging device, the actual inlet air temperature data corresponding to the cabinet is acquired through the thermosensitive device, the surface temperature and the inlet air temperature of the cabinet can be detected in real time, the input and the output of the neural network model can be updated in time according to the detected temperature data, the accuracy of the predicted data of the neural network model can be improved through continuous training and learning of the neural network model, the inlet air simulation temperature data accuracy of the cabinet can be improved, the generalization of the model is improved compared with a single neural network model, the application scenes are wider, the model structure parameters and the like can be updated on line in real time according to different application scenes, the model prediction accuracy is improved through continuous learning, and the inlet air temperature of the cabinet can be predicted more accurately, the method has the advantages that the time for earlier-stage exploration of the machine room data center is shortened, the number of thermosensitive probes arranged in the later-stage transformation process can be reduced, the workload of field construction is reduced, the material and labor cost is saved, a new, quick and effective method is provided for later periodic inspection, and popularization are facilitated.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

9页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种热流计动态特性校准装置及方法

Method for predicting cabinet inlet air temperature based on reinforcement learning model

相关技术

网友询问留言