Distributed operation cluster dynamic energy consumption management method based on data coverage set

文档序号:1390736 发布日期:2020-02-28 浏览:21次 中文

阅读说明:本技术 一种基于数据覆盖集的分布式运算集群动态能耗管理方法 (Distributed operation cluster dynamic energy consumption management method based on data coverage set ) 是由 王培健 齐勇 侯迪 林锦炜 田真 李文涛 赵文嘉 于 2019-10-28 设计创作,主要内容包括:一种基于数据覆盖集的分布式运算集群动态能耗管理方法,包括以下步骤:步骤1,准备阶段:在集群中划分出若干数量的互不相同的数据覆盖集;划分时要求所有数据覆盖集的并集应涵盖集群中的所有节点;步骤2,工作阶段:启动集群时,选择列表内的一个数据覆盖集,并启用其中的所有节点,在启动完成后,除数据覆盖集外,工作节点的数量依据需要动态调整;步骤3,调整阶段:在集群运行一段时间后,要更换使用的数据覆盖集;切换间隔应以天为单位,将其与集群的关机维护相结合,即选择在集群定期维护时更换数据覆盖集合。本发明动态的调整工作节点数量,可以在保证数据可用性的情况下,降低能耗,减少设备损耗,从而大幅度降低运行成本。(A distributed operation cluster dynamic energy consumption management method based on a data coverage set comprises the following steps: step 1, preparation phase: dividing a plurality of different data coverage sets in a cluster; when dividing, requiring the union of all data coverage sets to cover all nodes in the cluster; step 2, working stage: when the cluster is started, one data coverage set in the list is selected, all nodes in the data coverage set are started, and after the start is finished, the number of the working nodes is dynamically adjusted according to the requirement except the data coverage set; step 3, an adjusting stage: after the cluster runs for a period of time, the used data coverage set is replaced; the switching interval should be in units of days, which is combined with the shutdown maintenance of the cluster, i.e. the data coverage set is selected to be replaced when the cluster is regularly maintained. The invention dynamically adjusts the number of the working nodes, can reduce energy consumption and equipment loss under the condition of ensuring the availability of data, thereby greatly reducing the operation cost.)

1. A distributed operation cluster dynamic energy consumption management method based on a data coverage set is characterized by comprising the following steps:

step 1, preparation phase: dividing a plurality of different data coverage sets in a cluster; when dividing, requiring the union of all data coverage sets to cover all nodes in the cluster;

step 2, working stage: when the cluster is started, one data coverage set in the list is selected, all nodes in the data coverage set are started, and after the start is finished, the number of the working nodes is dynamically adjusted according to the requirement except the data coverage set;

step 3, an adjusting stage: after the cluster runs for a period of time, the used data coverage set is replaced; the switching interval is in units of days, and is combined with shutdown maintenance of the cluster, namely, the data coverage set is selected to be replaced when the cluster is regularly maintained.

2. The method for managing the dynamic energy consumption of the distributed computing cluster based on the data coverage set as claimed in claim 1, wherein the step 1 specifically comprises the following steps:

1) assuming that the cluster comprises S racks, equally dividing each rack into W areas, and numbering the obtained S x W areas;

2) setting the size of each data coverage set as n areas, wherein the area numbers of each data coverage set are (1-n), (2-n +1), (3-n +2) … (WS-1-WS, 1-n-2) and (WS, 1-n-1);

3) after the division is completed, W × S data coverage sets are contained in the cluster, and the maximum overlapping proportion among different data coverage sets is 1-1/n;

4) and after the division of the data coverage set is completed, storing the result into the main node for future scheduling.

3. The dynamic energy consumption management method for the distributed computing cluster based on the data coverage set as claimed in claim 2, wherein when n is selected, n is not equal to S; during the division, the size of a single data coverage set is 10% -30% of the whole cluster; requiring that each data overlay set contain one copy of a data block, the minimum number of copies of each data block is [ WS/n ] + 1.

4. The method for managing the dynamic energy consumption of the distributed computing cluster based on the data coverage set as claimed in claim 1, wherein the adjustment mode in the step 2 is as follows:

1) firstly, testing the task upper limit of a single server in advance, and calculating the total service capacity of the current available node during running; the service capability of the cluster in the initial state depends on the size of the currently used data coverage set;

2) in operation, if the main node detects that the demand of the user on the cluster is increased and exceeds the load capacity of the available nodes of the cluster at the moment, the node in dormancy is awakened; if the main node detects that the user requirement is far lower than the current service capability, reducing the working nodes, and at the moment, not closing the nodes in the data coverage set; if the load and the demand are relatively balanced, the number of the working nodes does not need to be adjusted;

3) when the enabled node is selected, the node which stores the larger number of data blocks needed by the current user/application is preferably awakened.

5. The method for managing the dynamic energy consumption of the distributed computing cluster based on the data coverage set as claimed in claim 1, wherein the replacement process in step 3 is as follows:

1) awakening nodes in the cluster which are still in dormancy, synchronizing data, and updating the data in the awakened nodes to the latest version;

2) selecting a data coverage set used in the next period of time from the data coverage set list, comparing all the data coverage sets in the list with a previously used set when selecting, respectively calculating the size of the overlapping part of each data coverage set and the set, and randomly selecting the data coverage set used in the next period of time from the data coverage set with the minimum overlapping part;

3) estimating service requirements when restarting the cluster according to historical data, determining the number of machines needing to be started, and selecting a corresponding number of additional machines outside the selected data coverage set;

4) when restarting the cluster, the nodes in the data coverage set selected in step 2) and the extra working nodes selected in step 3) are started instead.

Technical Field

The invention belongs to the field of energy consumption management of data centers, and particularly relates to a distributed operation cluster dynamic energy consumption management method based on a data coverage set.

Background

Distributed computing frameworks such as the Hadoop of Yahoo corporation can efficiently process vast amounts of data. The implementation of Hadoop relies on a distributed file system (HDFS) with high fault tolerance. HDFS is designed to be deployed on a large amount of inexpensive hardware that has distributed multiple copies of a block of data, providing users with fast data access and continuing to provide service through the copies in the event of partial machine failure.

The HDFS adopts a master/slave node structure, and a typical architecture is shown in fig. 1. Each cluster comprises a main node and a plurality of data nodes, wherein the main node is responsible for managing a file system name space and file access of a user; the data nodes storing the data blocks are distributed on different racks to provide direct data reading and writing services.

In the HDFS, a file is generally divided into data blocks of a certain size and distributed among a plurality of data nodes, a master node stores distribution information of the data blocks of the file, a user sends a request for file-based operation (opening, closing, renaming, deleting, or the like) to the master node, the master node returns the position of a target data block in a cluster when reading and writing of file data are involved, and the user can directly interact with the corresponding data nodes to complete the operation.

In a distributed computing cluster, in order to ensure the availability of data, a sufficient number of data nodes generally need to be operated at the same time, and some nodes have no computing task but cannot go to sleep.

According to the traditional method, because the selection of the data block storage nodes does not consider the data availability factor, the data block distribution randomness is high, the node set required by ensuring the data availability is difficult to determine, and a large number of nodes cannot enter the dormancy.

To reduce cluster power consumption and reduce the number of enabled machines, the "data overlay set" may be used instead to ensure data availability, i.e., to ensure that a subset of the cluster, which contains at least one copy of all the data blocks, is running. After the data coverage set is adopted, the nodes outside the set can be flexibly started/closed according to the performance requirement without considering the availability of data.

However, in the current technical practice, when a data coverage set is adopted, only one data coverage set is usually established in a cluster, and at this time, the availability of the cluster has a great dependence on the reliability of a machine in the data coverage set. At the same time, continued operation of the data coverage set may also result in faster machine life/performance degradation.

Disclosure of Invention

The invention aims to provide a distributed operation cluster dynamic energy consumption management method based on a data coverage set, so as to solve the problems.

In order to achieve the purpose, the invention adopts the following technical scheme:

a distributed operation cluster dynamic energy consumption management method based on a data coverage set comprises the following steps:

step 1, preparation phase: dividing a plurality of different data coverage sets in a cluster; when dividing, requiring the union of all data coverage sets to cover all nodes in the cluster;

step 2, working stage: when the cluster is started, one data coverage set in the list is selected, all nodes in the data coverage set are started, and after the start is finished, the number of the working nodes is dynamically adjusted according to the requirement except the data coverage set;

step 3, an adjusting stage: after the cluster runs for a period of time, the used data coverage set is replaced; the switching interval should be in units of days, which is combined with the shutdown maintenance of the cluster, i.e. the data coverage set is selected to be replaced when the cluster is regularly maintained.

Further, the step 1 specifically comprises the following steps:

1) assuming that the cluster comprises S racks, equally dividing each rack into W areas, and numbering the obtained S x W areas;

2) setting the size of each data coverage set as n areas, wherein the area numbers included in each data coverage set are (1-n), (2-n +1), (3-n +2) … (WS, 1-n-1) in sequence;

3) after the division is completed, W × S data coverage sets are contained in the cluster, and the maximum overlapping proportion among different data coverage sets is 1-1/n;

4) and after the division of the data coverage set is completed, storing the result into the main node for future scheduling.

Further, when n is selected, n is not equal to S; during the division, the size of a single data coverage set is 10% -30% of the whole cluster; requiring that each data overlay set contain one copy of the data block, the minimum number of copies is [ WS/n ] + 1.

Further, the adjustment mode in step 2 is as follows:

1) firstly, testing the task upper limit of a single server in advance, and obtaining the total service capacity of the current available node during operation; the service capability of the cluster in the initial state depends on the size of the currently used data coverage set;

2) in operation, if the main node detects that the demand of the user on the cluster is increased and exceeds the load capacity of the available nodes of the cluster at the moment, the node in dormancy is awakened; if the main node detects that the user requirement is far lower than the current service capability, reducing the working nodes, and at the moment, not closing the nodes in the data coverage set; if the load and the demand are relatively balanced, the number of the working nodes does not need to be adjusted;

3) when the node is selected to be enabled (closed), the node which stores the larger number of data blocks needed by the current user/application should be awakened preferentially.

Further, the replacement process in step 3 is as follows:

1) awakening nodes in the cluster which are still in dormancy, synchronizing data, and updating the data in the awakened nodes to the latest version;

2) selecting a data coverage set used in the next period of time from the data coverage set list, comparing all the data coverage sets in the list with a previously used set when selecting, respectively calculating the size of the overlapping part of each data coverage set and the set, and randomly selecting the data coverage set used in the next period of time from the data coverage set with the minimum overlapping part;

3) estimating service requirements when restarting the cluster according to historical data, determining the number of machines needing to be started, and selecting a corresponding number of additional machines outside the selected data coverage set;

4) when restarting the cluster, the nodes in the data coverage set selected in step 2) and the extra working nodes selected in step 3) are started instead.

Compared with the prior art, the invention has the following technical effects:

by starting the concept of the minimum coverage set, each data block in the data set is ensured to have at least one copy on a started machine at each moment, so that the normal work of the data center is ensured, and on the basis, in combination with the on-demand starting of the data nodes, under the condition of ensuring the availability of the data, additional unnecessary nodes are started, the number of the working nodes is increased, and the cluster performance and the processing speed can be improved; correspondingly, the unnecessary nodes are closed, the number of working nodes is reduced, the overall energy consumption of the cluster can be reduced, the equipment loss is reduced, and the electric power cost and the equipment cost are greatly reduced.

The present invention provides a flexible coping strategy for adapting to performance needs when operating costs are not a primary factor to be considered. When the data coverage set is divided, different schemes can be generated by selecting different set capacities, and the larger the data coverage set capacity is, the larger the load capacity can be adapted to because the number of the nodes is large.

Different levels of methods correspond to different application scenarios. Under the condition that the load is relatively stable, a data coverage set with large capacity is selected, so that the performance requirement can be met, and the cluster stability can be improved; under the condition of frequent load change, a data coverage set with smaller capacity is selected, and the load is adapted by starting/closing the working nodes outside the set in time, so that the energy consumption can be reduced as much as possible under the condition of meeting the performance requirements of users.

Drawings

Fig. 1 is a schematic diagram of the architecture of the HDFS.

Fig. 2 is a flowchart of a distributed computing cluster dynamic energy consumption management method.

FIG. 3 is a schematic view of region numbering according to the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

referring to fig. 2 and fig. 3, a distributed computing cluster dynamic energy consumption management method based on a data overlay set includes the following steps:

step 1, preparation phase: dividing a plurality of different data coverage sets in a cluster; when dividing, requiring the union of all data coverage sets to cover all nodes in the cluster;

step 2, working stage: when the cluster is started, one data coverage set in the list is selected, all nodes in the data coverage set are started, and after the start is finished, the number of the working nodes is dynamically adjusted according to the requirement except the data coverage set;

step 3, an adjusting stage: after the cluster runs for a period of time, the used data coverage set is replaced; the switching interval should be in units of days, which is combined with the shutdown maintenance of the cluster, i.e. the data coverage set is selected to be replaced when the cluster is regularly maintained.

The step 1 specifically comprises the following steps:

1) assuming that the cluster comprises S racks, equally dividing each rack into W areas, and numbering the obtained S x W areas;

2) setting the size of each data coverage set as n areas, wherein the area numbers included in each data coverage set are (1-n), (2-n +1), (3-n +2) … (WS, 1-n-1) in sequence;

3) after the division is completed, W × S data coverage sets are contained in the cluster, and the maximum overlapping proportion among different data coverage sets is 1-1/n;

4) and after the division of the data coverage set is completed, storing the result into the main node for future scheduling.

When n is selected, n is not equal to S, otherwise, a situation may occur in which all copies of one data block are stored in the same rack; during the division, the size of a single data coverage set is 10% -30% of the whole cluster; after the cluster is divided according to the method, the number of copies of the data block is also limited, each data coverage set is required to contain one copy of the data block, and the minimum value of the number of copies is [ WS/n ] + 1.

Compared with the traditional distributed cluster working mode, the scheme also requires modification of a cluster data copy management mode besides dynamic increase and decrease of working nodes. During the operation of the cluster, partial adjustment needs to be made on the traditional copy generation and change mechanism.

First, when selecting a copy storage node for a newly created data block, it is guaranteed that at least one copy exists in each data overlay set. Taking the above-described division method as an example, when a new data block is generated, first, a first copy is created on any machine in any one of the areas K of the areas 1 to n. Copies are then added separately on any of the machines in regions K + n, K +2n … until each data overlay set contains a copy of the new data block. According to the mode, each area is a part of n different data coverage sets, and adding one copy at intervals of n areas can ensure that the data coverage sets where the front copy and the back copy are located do not overlap, so that the requirement can be met only by creating [ WS/n ] +1 copy.

Second, when processing operations involving the number of copies of a block, the master node, in replying to the client, only computes the copies in the node currently in operation and ignores the copies in the sleeping nodes to avoid unnecessary new copy generation.

The adjustment mode in step 2 is as follows:

1) firstly, testing the task upper limit of a single server in advance, and obtaining the total service capacity of the current available node during operation; since the cluster initially enables only nodes in the data coverage set, the service capabilities of the cluster in the initial state depend on the size of the data coverage set currently in use;

2) in operation, if the main node detects that the demand of the user on the cluster is increased and exceeds the load capacity of the available nodes of the cluster at the moment, the node in dormancy is awakened; if the main node detects that the user requirement is far lower than the current service capability (namely the difference between the requirement and the service capability exceeds the service capability of one node), reducing the working nodes, and at the moment, not closing the nodes in the data coverage concentration; if the load and the demand are relatively balanced, the number of the working nodes does not need to be adjusted;

3) because the data blocks stored in each node are different, when the node is selected to be enabled (closed), the node storing the larger number of data blocks required by the current user/application should be awakened preferentially.

The replacement process in step 3 is as follows:

1) awakening nodes in the cluster which are still in dormancy, synchronizing data, and updating the data in the awakened nodes to the latest version;

2) selecting a data coverage set used in the next period of time from the data coverage set list, comparing all the data coverage sets in the list with a previously used set when selecting, respectively calculating the size of the overlapping part of each data coverage set and the set, and randomly selecting the data coverage set used in the next period of time from the data coverage set with the minimum overlapping part;

3) estimating service requirements when restarting the cluster according to historical data, determining the number of machines needing to be started, and selecting a corresponding number of additional machines outside the selected data coverage set;

4) when restarting the cluster, the nodes in the data coverage set selected in step 2) and the extra working nodes selected in step 3) are started instead.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种边缘计算场景下通过休眠降低系统功耗的调度方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!