Information-agnostic Coflow scheduling system capable of automatically adjusting queue threshold and scheduling method thereof

文档序号:1547879 发布日期:2020-01-17 浏览:26次 中文

阅读说明:本技术 自动调整队列阈值的信息无感知Coflow调度系统及其调度方法 (Information-agnostic Coflow scheduling system capable of automatically adjusting queue threshold and scheduling method thereof ) 是由 汪硕 王速 黄韬 霍如 刘韵洁 于 2019-09-25 设计创作,主要内容包括:本发明公开一种自动调整队列阈值的信息无感知Coflow调度系统及其调度方法,调度系统包括终端主机上分布的监控系统(Monitoring System,MS)和中央控制器(Central Controller,CC);所述监控系统用于在终端主机上收集Coflow的信息,所述中央控制器根据所述Coflow的信息调整降级阈值的集合,中央控制器观察Coflow的调度结果作为强化学习算法的奖励,并通过该奖励来进一步优化队列的降级阈值。(The invention discloses an information-unaware flow scheduling System for automatically adjusting queue threshold and a scheduling method thereof, wherein the scheduling System comprises a Monitoring System (MS) and a Central Controller (CC) which are distributed on a terminal host; the monitoring system is used for collecting the information of the Coflow on the terminal host, the central controller adjusts the set of the degradation threshold values according to the information of the Coflow, the central controller observes the scheduling result of the Coflow to serve as the reward of the reinforcement learning algorithm, and the degradation threshold values of the queues are further optimized through the reward.)

1. An information-unaware flow scheduling System for automatically adjusting a queue threshold is characterized in that the scheduling System comprises a Monitoring System (MS) and a Central Controller (CC) distributed on a terminal host;

the monitoring system is used for collecting the information of the Coflow on the terminal host, the central controller adjusts the set of the degradation threshold values according to the information of the Coflow, the central controller observes the scheduling result of the Coflow to serve as the reward of the reinforcement learning algorithm, and the degradation threshold values of the queues are further optimized through the reward.

2. The information-unaware flow scheduling method for automatically adjusting the queue threshold is characterized by comprising the following steps of:

step one, setting a queue threshold value alphaiQueue Q formed by K priorities1,Q2,…,QkPriority of said queue is from Q1To QkGradually decreasing;

step two, when a new flow arrives, the flow enters the highest priority of the queue, and when the number of bytes sent by the flow exceeds a queue threshold value alphaiWhile, flow the flow from QiDegradation to Qi+1Removing the Coflow from the queue until the Coflow scheduling is completed; the monitoring system MS collects the size of the completed flow and the flow completion time and reports the collected flow information to the central controller CC at each time step t;

and step three, the central controller CC utilizes a DDPG (deep Deterministic Policy gradient) algorithm training model to update parameters of the neural network. After the CC receives the flow state information collected by the MS at each time step t, a threshold value { alpha ] of a group of queues is output1,α2,…,αkThe central controller CC utilizes a reinforcement learning procedureTo automate the decision making.

3. The information-unaware flow scheduling method for automatically adjusting the queue threshold according to claim 2, wherein the third step specifically comprises:

step 3.1, construct the state space, order

Figure FDA0002214169430000011

step 3.2, at the t-th time step, after receiving the flow information collected by the monitoring system, the central controller outputs a group of queue degradation threshold values

Figure FDA0002214169430000015

step 3.3, using the degradation threshold

Figure FDA0002214169430000017

Figure FDA0002214169430000021

wherein

Figure FDA0002214169430000022

4. The information-unaware flow scheduling method for automatically adjusting queue thresholds according to claim 3, wherein a DDPG algorithm is used to train a neural network, represent a strategy as the neural network, take a set of completed Coflow as input, and output a set of queue degradation thresholds;

at each time step, the neural network receives the latest state s from the hosttAnd will tuple(s)t+1,st,at,rt) Stored in its buffer for the next step of learning, where st+1And rtCalculated in the next training step;

by comparing the throughput of the whole network at the time step t and the throughput of the whole network at the time step t-1, a reward signal can be obtained through a formula (1) at the time step t, the central controller receives the reward signal and obtains feedback of actions generated at the time step t, the updating step uses a DDPG algorithm to train an operator-critical network continuously, iteration is stopped when a strategy is unchanged, the training is finished at the moment, and the optimal solution of a scheduling mechanism is obtained.

Technical Field

The invention relates to an information-unaware flow scheduling system for automatically adjusting a queue threshold and a scheduling method thereof, and relates to the technical field of communication.

Background

In a Data Center Network (DCN), a MapReduce, Spark, Dryad, etc. parallel computing model has been widely used to support business applications and scientific research. These different computing frameworks have one common feature: they all have successive computation stages between clusters of machines and when all data of one stage is sent in a completion inch, the task can start to execute the next data processing procedure. Specifically, Coflow is defined as the set of a set of flows between two sets of machines. An example of Coflow is the Shuffle process between Mapper and Reducer in MapReduce. Relevant work has shown that the flow completion time exceeds 33% of the total computation task completion time, in some cases even up to 50%. Therefore, optimizing the data transmission of the parallel computing task is very important for improving the performance of the application program in the DCN.

Existing flow scheduling mechanisms can be divided into Information-Aware (Information-Aware) flow scheduling mechanisms and Information-unaware (Information-Aware) flow scheduling mechanisms. The information-aware flow scheduling mechanism needs to acquire information such as total data amount required to be sent by a flow and transmission delay requirement of the flow in advance, and the information-unaware flow scheduling mechanism does not need to acquire any information about the flow. Although the information-aware flow scheduling mechanism has good performance, it is difficult to deploy the mechanism in a real data center because it is difficult to obtain detailed information of flow in advance in the data center. In order to increase the deployability of the flow scheduling mechanism, a message-unaware flow scheduling mechanism is proposed, and the flow scheduling process is completed under the condition of acquiring the flow information or only knowing part of the flow information, such as the Aalo mechanism. The core idea of these mechanisms is a rate allocation algorithm based on the bytes sent. Specifically, they place the Coflow in different priority queues according to how much data has been sent, and the Coflow is gradually lowered from the highest priority to the lower priority queue when the amount of bytes it sends exceeds a predefined threshold.

The information unaware Aalo mechanism simulates a 'minimum flow top priority' policy to schedule the flow. Since the size of the Coflow is not accurately known, Aalo uses the sum of the number of bytes that Coflow has been sent, counted by each port, as an estimate of the Coflow size. Aalo implements a multi-level schedule based on an estimate of the flow size. The Aalo mechanism sets discrete thresholds that downgrade the cofow at the highest priority queue to a lower priority queue when the number of bytes sent by the cofow exceeds a predefined threshold, which is generally the assignment of all on-going coflows to queues of different priorities based on the size of the coflows already sent.

When the bandwidth is allocated, the Aalo allocates the bandwidth to all queues according to the weight of the priority of each queue, and the higher the priority is, the larger the weight is, so that all queues can be allocated with the bandwidth at any time, and the phenomena that the head of the queue is blocked and certain flow is in an infinite waiting state are avoided. In addition, to ensure work retention, if a queue has excess bandwidth, it is assigned with a "max-min-fair" algorithm according to the priority weights of the unsaturated queues. And then, in each queue, the flow is scheduled by using a first-in first-out principle, and the subflows of each flow are scheduled by using a maximum-minimum fairness algorithm.

In the Aalo mechanism, the problem of choosing the optimal queue priority threshold is still not solved. The degradation threshold in Aalo is static and the reasonableness of the queue threshold depends on the operator's ability to pre-select a better threshold. When the predefined threshold does not match the traffic in the data center, the performance of the coflow scheduling mechanism will be greatly reduced. Therefore, in dynamic clusters of different sizes, the selection of the threshold should be adjusted and changed correspondingly, but the Aalo mechanism has not provided a suitable metric for setting the threshold of different clusters effectively.

For the threshold selection problem in the Aalo mechanism, how to select the threshold of the queue is analyzed in the literature and a Down-hill searching (DHS) algorithm for designing the threshold is proposed. Specifically, a partitioned flow-Aware Least-attached Service (D-CLAS) system proposed in Aalo is modeled by an M/G/1 queue, and the average flow delay is expressed as a function of a degradation threshold. Furthermore, the valley shape of the function is demonstrated and a DHS algorithm is designed that locates a set of optimal degradation thresholds that minimizes the average flow completion time in the system.

While the DHS algorithm results in an improvement in the mean completion time of the Coflow, the overall cycle time of the design heuristic is very long, typically several weeks or more. Since collecting statistical and analytical data typically takes a long time, the threshold value is typically obtained based on long-term observations of the flow size distribution.

Disclosure of Invention

In order to overcome the problems, the invention provides an information-unaware Coflow scheduling mechanism DeepAalo, which minimizes the completion time of Coflow by automatically adjusting the threshold of a priority queue, greatly reduces the average Coflow completion time of the DeepAalo under the condition of stable flow distribution, and has far better performance than the DeepAalo when scheduling larger Coflow.

The technical scheme of the invention is that

An information-unaware flow scheduling System for automatically adjusting a queue threshold, wherein the scheduling System comprises a Monitoring System (MS) and a Central Controller (CC) distributed on a terminal host;

the monitoring system is used for collecting the information of the Coflow on the terminal host, the central controller adjusts the set of the degradation threshold values according to the information of the Coflow, the central controller observes the scheduling result of the Coflow to serve as the reward of the reinforcement learning algorithm, and the degradation threshold values of the queues are further optimized through the reward.

Further, the method comprises the following steps:

step one, setting a queue threshold value alphaiQueue Q formed by K priorities1,Q2,…,QkPriority of said queue is from Q1To QkGradually decreasing;

step two, when new Coflo arrives, the Coflow enters the highest priority of the queue when the number of bytes sent by the Coflow exceeds the queue threshold alphaiWhile, flow the flow from QiDegradation to Qi+1Removing the Coflow from the queue until the Coflow scheduling is completed; the monitoring system MS collects the size of the completed flow and the flow completion time and reports the collected flow information to the central controller CC at each time step t;

and step three, the central controller CC utilizes a DDPG (deep Deterministic Policy gradient) algorithm training model to update parameters of the neural network. After the CC receives the flow state information collected by the MS at each time step t, a threshold value { alpha ] of a group of queues is output1,α2,…,αkThe central controller CC makes decisions automatically using reinforcement learning procedures.

Further, the third step is specifically:

step 3.1, construct the state space, orderThe flow indicating the completion of the transfer in the t-th step is shown as

Figure BDA0002214169440000032

The state space is represented as the set of all completed coflows in the data center within the tth time step

Figure BDA0002214169440000034

Each flow contains its five-tuple identification: (N)f,Sf(min),Sf(max),Sf(ave),SC) Wherein N isfRepresenting the number of the sub-streams in each flow, and respectively representing the maximum value of the sub-stream flow in the flow, the minimum value of the sub-stream flow, the average value of the sub-stream flow and the total number of bytes of the flow by the other four attributes;

and 3.2, at the t-th time step, after receiving the flow information collected by the monitoring system, the central controller outputsDestage threshold out of a group of queues

Figure BDA0002214169440000035

Wherein the content of the first and second substances,

Figure BDA0002214169440000036

a threshold value representing the ith queue at time step t;

step 3.3, using the degradation threshold

Figure BDA0002214169440000037

And acquiring the average completion time of the completed flow to calculate the reward, wherein the reward signal is designed by taking the minimized completion time of the flow as a target:

Figure BDA0002214169440000038

wherein

Figure BDA0002214169440000039

Indicates the total number of bytes, C, of each completed coflow C transmission per unit timeERepresenting the set of all completed coflows.

Further, a DDPG algorithm is utilized to train a neural network, a strategy is expressed as the neural network, a set of completed flow is used as input, and a group of degradation threshold values of the queue are output;

at each time step, the neural network receives the latest state s from the hosttAnd will tuple(s)t+1,st,at,rt) Stored in its buffer for the next step of learning, where st+1And rtCalculated in the next training step;

by comparing the throughput of the whole network at the time step t and the throughput of the whole network at the time step t-1, a reward signal can be obtained through a formula (1) at the time step t, the central controller receives the reward signal and obtains feedback of actions generated at the time step t, the updating step uses a DDPG algorithm to train an operator-critical network continuously, iteration is stopped when a strategy is unchanged, the training is finished at the moment, and the optimal solution of a scheduling mechanism is obtained.

Compared with the prior art, the invention has the following advantages:

(1) effect when flow distribution is stabilized

In case of stable traffic distribution (Coflow size distribution and network load are fixed), the average Coflow completion time of deepAalo is reduced by 1.37 times compared to the Aalo mechanism over a training procedure of several hours. In particular, when scheduling larger coflows, the performance of the deepAalo mechanism is much better than Aalo, with an average Coflow completion time reduction of up to 28.01%.

(2) Effect of network load variation

For the static threshold scheme Aalo, its performance degrades rapidly when the traffic characteristics do not match the parameter settings. Deepaalo exhibits good adaptability and stability when flow characteristics change, because Deepaalo can dynamically adjust the priority of the flow by changing the queue threshold value, thereby obtaining a better flow scheduling result. Brief description of the drawings

Fig. 1 is an architecture diagram of an information-unaware flow scheduling system for automatically adjusting queue thresholds according to the present invention.

Detailed Description

The technical solutions of the embodiments of the present invention will be described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention improves an Aalo mechanism, provides an information-unaware Coflow scheduling mechanism DeepAalo, and minimizes the completion time of Coflow by automatically adjusting the threshold of a priority queue. DeepAalo uses deep reinforcement learning techniques to convert the design of thresholds into a continuous learning process. Specifically, the DeepAalo training neural network model automatically updates the degradation threshold of the queue at intervals according to the flow information collected by the monitoring program on the host. Thus, deepAalo has good adaptivity to a large amount of uncertain traffic in a data center network. The following technical problems are specifically solved:

(1) designing a two-stage deep reinforcement learning system. A monitoring program running on the end host collects information on the flow and the central controller adjusts the set of degradation thresholds according to this information.

(2) And designing a flow scheduling mechanism with no information perception, namely only depending on partial information of the flow which is scheduled to be finished. With the DDPG algorithm, the degradation thresholds of the queue are automatically updated by learning from past decisions, independent of a preprogrammed model.

The invention provides a complete technical scheme (invention scheme)

The goal of deepAalo is to design an information-unaware Coflow scheduling mechanism that can minimize CCT by automatically adjusting the thresholds of the queues. Unlike the Aalo mechanism, DeepAalo does not use a set of predefined thresholds, but instead learns the policy from observations by utilizing Deep Reinforcement Learning (DRL).

System model

The DeepAalo system consists of two parts: a Monitoring System (MS) and a Central Controller (CC) distributed on the host computer. A monitoring program running on the end host collects information on the flow and the central controller adjusts the set of degradation thresholds according to this information. The central controller then observes the flow scheduling results as a reward for the reinforcement learning algorithm and further optimizes the queue's degradation threshold by that reward.

(1) Monitoring System (MS)

Inspired by the Aalo mechanism, we use a flow scheduling mechanism with multi-level queues. It consists of K queues (Q)1,Q2,…,Qk) Composition, priority of queue from Q1To QkAnd gradually decreases. When new coflows arrive, they enter the highest priority queue Q1When the number of bytes sent by the flow exceeds the queue threshold value alphaiWhile, flow the flow from QiDegradation to Qi+1. In addition, when the flow schedules are complete, they are dequeued. At a plurality of stagesIn the mechanism of the queue, the Monitoring System (MS) consists of monitoring programs distributed over many hosts, which collect the size of the completed flow and the flow completion time and report the collected flow information to the Central Controller (CC) at intervals.

(2) Central Controller (CC)

The central controller runs a reinforcement learning program to automatically make decisions. Specifically, a DDPG (Deepternomistic Policy gradient) algorithm is used for generating a group of degradation thresholds alpha1,α2,…,αk}。

Design of reinforcement learning algorithm

(1) State space

Under assumptions made by Aalo and other related mechanisms, the network fabric of the data center is abstracted as a non-blocking switch, focusing only on its ingress and egress ports. Therefore, we omit the load of the path, and the defined state space contains only the state of the Coflow. Let us order

Figure BDA0002214169440000051

The completed Coflow is transmitted in step t,

Figure BDA0002214169440000052

can be expressed as

Figure BDA0002214169440000053

Figure BDA0002214169440000054

In our model, the state space is represented as the set of all completed coflows in the data center for the first time step

Figure BDA0002214169440000061

A Coflow is identified by its five-tuple (N)f,Sf(min),Sf(max),Sf(ave),SC). Wherein N isfRepresenting the number of substreams in each flow, and the other four attributes respectively represent the maximum value of the substream flow in the flow and the substream flowMinimum, average of substream flows and total number of bytes of Coflow.

(2) Movement space

In the t step, after receiving the flow information collected by the monitoring program, the central controller outputs a group of queue degradation thresholds

Figure BDA0002214169440000062

Wherein

Figure BDA0002214169440000063

Representing the threshold of the ith queue at time step t.

(3) Reward signal

In use of threshold

Figure BDA0002214169440000064

Thereafter, DeepAalo collects the average completion time of the completed Coflow to calculate the reward.

The goal of deepAalo is to minimize the Coflow completion time, so we design the reward signal to be:

Figure BDA0002214169440000065

wherein

Figure BDA0002214169440000066

Indicates the total number of bytes, C, of each completed coflow C transmission per unit timeERepresenting the set of all completed coflows. This formula shows that if a new action delivers more traffic per unit time than the previous action, we will give a positive feedback reward for the current action. The purpose of this reward is to maximize the average throughput of the entire network.

Training algorithm

To achieve better performance, deep aalo trains the neural network using the DDPG algorithm. We represent the strategy as a neural network that takes as the algorithm input the set of completed coflows and outputs a set of queue degradation thresholds. In each time step, the neural network is from the host sideReceiving the latest state stAnd will tuple(s)t+1,st,at,rt) Stored in its buffer for the next step of learning, where st+1And rtCan be calculated in the next training step. By comparing the throughput of the whole network at time step t and time step t-1, we can obtain the reward signal r at time step t by formula (1)t. The central controller can receive a reward signal (negative or positive) and get feedback on the action generated at time step t. In the updating step, the operator-critical network is continuously trained by using a DDPG algorithm, and a better training result is finally obtained.

The sign of the training end is that the reward curve is converged and does not change any more during the experiment. If the strategy is expressed, i feel that when the strategy is unchanged, a reinforcement learning model with stable parameters is trained, the training is finished, the output threshold is the optimal threshold in the state, and the flow completion time reaches the minimum value.

The technical scheme of the invention brings beneficial effects

(1) Effect when flow distribution is stabilized

In case of stable traffic distribution (Coflow size distribution and network load are fixed), the average Coflow completion time of deepAalo is reduced by 1.37 times compared to the Aalo mechanism over a training procedure of several hours. In particular, when scheduling larger coflows, the performance of the deepAalo mechanism is much better than Aalo, with an average Coflow completion time reduction of up to 28.01%.

(2) Effect of network load variation

For the static threshold scheme Aalo, its performance degrades rapidly when the traffic characteristics do not match the parameter settings. Deepaalo exhibits good adaptability and stability when flow characteristics change, because Deepaalo can dynamically adjust the priority of the flow by changing the queue threshold value, thereby obtaining a better flow scheduling result.

The application mode of the invention can be adjusted according to the actual situation, and is not used for limiting the invention. The technical scheme provided by the invention is described in detail above; the description of the present embodiment is intended only to aid in the understanding of the method of the present invention. The application mode of the present invention can be adjusted according to the actual situation, and is not intended to limit the present invention.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:数据包传输方法及相关装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!