Method, device and equipment for multi-node cluster ring communication and readable medium

文档序号：1815843 发布日期：2021-11-09 浏览：30次中文

阅读说明：本技术 一种多节点集群环形通信的方法、装置、设备及可读介质 (Method, device and equipment for multi-node cluster ring communication and readable medium ) 是由罗建刚于 2021-08-14 设计创作，主要内容包括：本发明公开了一种多节点集群环形通信的方法,包括：对当前节点内部的所有GPU进行节点内数据整合,并将整合得到的单节点数据汇总到首位和末位GPU中；将当前节点的首位GPU和上一相邻节点的末位GPU进行节点间数据整合,并将整合得到的多节点数据汇总到当前节点的首位GPU和上一相邻节点的末位GPU中；将当前节点的末位GPU和下一相邻节点的首位GPU进行节点间数据整合,并将整合得到的多节点数据汇总到当前节点的末位GPU和下一相邻节点的首位GPU中；将当前节点的首位GPU中的数据和当前节点的末位GPU中的数据广播发送给当前节点内部的其他GPU。本发明还公开了一种多节点集群环形通信的装置、计算机设备和介质。(The invention discloses a method for multi-node cluster ring communication, which comprises the following steps: performing intra-node data integration on all GPUs in the current node, and summarizing single-node data obtained by integration into a first GPU and a last GPU; performing inter-node data integration on a first GPU of a current node and a last GPU of a last adjacent node, and summarizing multi-node data obtained by integration into the first GPU of the current node and the last GPU of the last adjacent node; performing inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node, and summarizing multi-node data obtained by integration into the last GPU of the current node and the first GPU of the next adjacent node; and broadcasting and sending the data in the first GPU of the current node and the data in the last GPU of the current node to other GPUs inside the current node. The invention also discloses a device, computer equipment and medium for multi-node cluster ring communication.)

1. A method for multi-node cluster ring communication, comprising the steps of:

performing intra-node data integration on all GPUs in the current node, and summarizing single-node data obtained by integration into a first GPU and a last GPU;

performing inter-node data integration on the head GPU of the current node and the last GPU of the last adjacent node, and summarizing multi-node data obtained by integration into the head GPU of the current node and the last GPU of the last adjacent node;

performing inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node, and summarizing multi-node data obtained by integration into the last GPU of the current node and the first GPU of the next adjacent node; and

and broadcasting and sending the data in the first GPU of the current node and the data in the last GPU of the current node to other GPUs inside the current node.

2. The method of multi-node cluster ring communication according to claim 1, further comprising:

dividing the data to be integrated into a plurality of preset data blocks, and calculating the sizes of the first data block and other data blocks based on the ratio of the inter-node data integration communication complexity to the intra-node data integration communication complexity.

3. The method of multi-node cluster ring communication according to claim 2, wherein the sizes of the other data blocks are equal, and the ratio of the size of the first data block to the size of the other data blocks is equal to the ratio of the inter-node data consolidation communication complexity to the intra-node data consolidation communication complexity.

4. The method of multi-node cluster ring communication of claim 1, wherein performing intra-node data consolidation for all GPUs within a current node comprises:

and connecting all GPUs in the current node through NVswitch, and performing intra-node data integration on all the GPUs.

5. The method of claim 1, wherein performing inter-node data integration between the first GPU of the current node and the last GPU of the previous neighboring node comprises: performing current data block integration between nodes on the first GPU of the current node and the last GPU of the last adjacent node, and simultaneously performing intra-node next data block integration on all GPUs in the current node;

performing inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node comprises the following steps: and performing current data block integration between nodes on the last GPU of the current node and the first GPU of the next adjacent node, and performing intra-node next data block integration on all GPUs in the current node.

6. The method of claim 1, wherein performing inter-node data integration between the first GPU of the current node and the last GPU of the previous neighboring node comprises: connecting the first GPU of the current node and the last GPU of the previous adjacent node through an IB card, and performing data integration between nodes on the first GPU of the current node and the last GPU of the previous adjacent node;

performing inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node comprises the following steps: and connecting the last GPU of the current node and the first GPU of the next adjacent node through an IB card, and performing data integration between nodes on the last GPU of the current node and the first GPU of the next adjacent node.

7. The method of claim 1, wherein broadcasting data in a first GPU of the current node and data in a last GPU of the current node to other GPUs internal to the current node comprises:

judging whether the data in the first GPU of the current node is the same as the data in the last GPU of the current node;

if the data in the first GPU of the current node is the same as the data in the last GPU of the current node, broadcasting and sending the data in the first GPU of the current node to the first half parts of other GPUs inside the current node, and broadcasting and sending the data in the last GPU of the current node to the second half parts of other GPUs inside the current node;

and if the data in the first GPU of the current node is different from the data in the last GPU of the current node, respectively broadcasting and sending the data in the first GPU of the current node and the data in the last GPU of the current node to other GPUs inside the current node.

8. An apparatus for multi-node cluster ring communication, comprising:

the first module is configured to perform intra-node data integration on all GPUs in the current node, and gather single-node data obtained through integration into a first GPU and a last GPU;

the second module is configured to perform inter-node data integration on the head GPU of the current node and the last GPU of the previous adjacent node, and gather multi-node data obtained through integration into the head GPU of the current node and the last GPU of the previous adjacent node;

the third module is configured to perform inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node, and gather multi-node data obtained through integration into the last GPU of the current node and the first GPU of the next adjacent node; and

and the fourth module is configured to broadcast and send data in the first GPU of the current node and data in the last GPU of the current node to other GPUs inside the current node.

9. A computer device, comprising:

at least one processor; and

a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

Technical Field

The present invention relates to the field of data transmission technologies, and in particular, to a method, an apparatus, a device, and a readable medium for multi-node cluster ring communication.

Background

Increasingly sophisticated machine learning algorithms, such as Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), etc., can achieve unprecedented performance in many practical applications and solve many field challenges, such as speech recognition, text processing, and image recognition. However, a long time is often required for training on a single GPU (graphics Processing Unit), and the application is limited to a certain extent due to low efficiency.

The most widely used method to reduce training time is to perform data parallel training in which each GPU has a complete copy of the model parameters and the GPU often exchanges parameters with other GPUs participating in the training, which results in significant communication costs and becomes a system bottleneck when communication is slow. Especially on a multi-node GPU server, communication is often performed through an IB (information button) card, and the speed is often only 25GB/s, or even lower. This greatly increases the training time of the deep learning model. The internal communication speed of the 8-card GPU can reach 250 GB/s. The communication efficiency between the nodes is too low, and the waste of the internal communication bandwidth is also caused.

In order to solve the communication bottleneck in training, the communication bottleneck can be solved from two aspects of hardware and software. In terms of hardware, more advanced GPU interconnection technologies are adopted, such as PCIE, NVLINK, NVSWITCH, and the like. The bandwidth of 300GB/s can be provided in NVLINK at most. In terms of software, advanced modern communication libraries are employed, such as NVIDIA 'collective communication library, Uber's horvod and hung AllReduce, etc.

In the existing communication methods, a ring communication method and a Double Binary tree (Binary tree) method are applied more. The ring communication method can effectively adopt Pipeline technology, so that the ring communication method has good expansibility and is applied more when large data volume is transmitted. The Double Binary tree method is often used when the topology is complicated, an effective communication loop cannot be established, and the data volume is small.

Disclosure of Invention

The existing annular communication algorithm is a common method for GPU communication and is often used when the data volume is large. Fig. 1 is a schematic diagram of a prior art ring communication algorithm, and as shown in fig. 1, in the ring communication method, each GPU only receives (receive) data of its left neighbor and sends (send) the data to the right neighbor, so that the data flows in a ring formed by the GPUs.

The ALL _ Reduce scheme is the most common communication scheme in deep learning. Taking Ring _ Reduce of ALL _ Reduce as an example, fig. 2 shows a schematic diagram of a Ring _ Reduce algorithm in the prior art, and as shown in fig. 2, the Ring _ Reduce process is divided into two major steps, the first step is scatter _ Reduce, the gradients of each other are gradually exchanged and fused, and finally each GPU contains a part of a complete fusion gradient; the second step is All _ gather, which gradually exchanges incomplete fusion gradients with each other, and finally All GPUs get complete fusion gradients. In the first step, total data is divided into k parts, 1/k parts of the total data are transmitted each time, the data are equally divided into n blocks, left and right neighbors are designated, and n-1 times of reduction operation is executed, wherein in the ith operation, a GPU (graphics processing Unit) is used for processing the data_jThe (j-i)% n block data of the self is sent to the right neighbor and the (j-i-1)% n block data of the left neighbor is received. And the received data is subjected to reduce operation. The second step is to send the reduce data obtained by each GPU to each GPU by a ring communication method.

The ring communication soft method can effectively utilize pipeline technology and has good expansibility on multiple GPUs. However, under the limitation of low-speed networks, such as the connection of low-speed IB cards, the transmission speed is only about 1GB/s, which has gradually become a bottleneck for GPU calculation. In the case of multi-node transmission, the transmission is often performed through a network, which imposes a more serious restriction on GPU interactive computation.

The large-scale data parallel training of deep learning brings larger and larger time overhead, and the problem to be solved is how to reasonably and efficiently utilize low-speed network transmission among nodes under the conditions of high-speed transmission network and high hardware cost. In the large-scale training process, the low transmission efficiency of the IB low-speed network between the nodes greatly wastes the high-speed transmission bandwidth in the nodes, and the IB low-speed network between the nodes also gradually becomes the bottleneck of the large-scale training of the neural network. In the prior art, nccl (nccl is the most popular and widely used GPU communication library at present, and a ring communication method is mainly adopted in large data volume) establishes a ring according to the number of IB cards when facing a multi-node GPU server, and when the number of IB cards in a node is small, bandwidth waste inside the node is often caused. In addition, in the process of ring communication, the GPU in the node preferentially completes transmission, so that the GPU between the nodes is waited for transmission, which causes further bandwidth waste.

In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a readable medium for multi-node cluster ring communication, which extend a new transmission method based on a ring communication algorithm for a specific multi-node GPU server, and effectively avoid the problem of communication bandwidth waste between nodes.

Based on the above object, an aspect of the embodiments of the present invention provides a method for multi-node cluster ring communication, including the following steps: performing intra-node data integration on all GPUs in the current node, and summarizing single-node data obtained by integration into a first GPU and a last GPU; performing inter-node data integration on the head GPU of the current node and the last GPU of the last adjacent node, and summarizing multi-node data obtained by integration into the head GPU of the current node and the last GPU of the last adjacent node; performing inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node, and summarizing multi-node data obtained by integration into the last GPU of the current node and the first GPU of the next adjacent node; and broadcasting and sending the data in the first GPU of the current node and the data in the last GPU of the current node to other GPUs inside the current node.

In some embodiments, the method further comprises: dividing the data to be integrated into a plurality of preset data blocks, and calculating the sizes of the first data block and other data blocks based on the ratio of the inter-node data integration communication complexity to the intra-node data integration communication complexity.

In some embodiments, the sizes of the other data blocks are equal, and a ratio of the size of the first data block to the size of the other data blocks is equal to a ratio of the inter-node data integration communication complexity to the intra-node data integration communication complexity.

In some embodiments, performing intra-node data consolidation for all GPUs inside the current node comprises: and connecting all GPUs in the current node through NVswitch, and performing intra-node data integration on all the GPUs.

In some embodiments, performing inter-node data integration between the first GPU of the current node and the last GPU of the previous neighboring node includes: performing current data block integration between nodes on the first GPU of the current node and the last GPU of the last adjacent node, and simultaneously performing intra-node next data block integration on all GPUs in the current node; performing inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node comprises the following steps: and performing current data block integration between nodes on the last GPU of the current node and the first GPU of the next adjacent node, and performing intra-node next data block integration on all GPUs in the current node.

In some embodiments, performing inter-node data integration between the first GPU of the current node and the last GPU of the previous neighboring node includes: connecting the first GPU of the current node and the last GPU of the previous adjacent node through an IB card, and performing data integration between nodes on the first GPU of the current node and the last GPU of the previous adjacent node; performing inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node comprises the following steps: and connecting the last GPU of the current node and the first GPU of the next adjacent node through an IB card, and performing data integration between nodes on the last GPU of the current node and the first GPU of the next adjacent node.

In some embodiments, broadcasting the data in the first GPU of the current node and the data in the last GPU of the current node to other GPUs inside the current node comprises: judging whether the data in the first GPU of the current node is the same as the data in the last GPU of the current node; if the data in the first GPU of the current node is the same as the data in the last GPU of the current node, broadcasting and sending the data in the first GPU of the current node to the first half parts of other GPUs inside the current node, and broadcasting and sending the data in the last GPU of the current node to the second half parts of other GPUs inside the current node; and if the data in the first GPU of the current node is different from the data in the last GPU of the current node, respectively broadcasting and sending the data in the first GPU of the current node and the data in the last GPU of the current node to other GPUs inside the current node.

In another aspect of the embodiments of the present invention, a multi-node cluster ring communication apparatus is further provided, including: the first module is configured to perform intra-node data integration on all GPUs in the current node, and gather single-node data obtained through integration into a first GPU and a last GPU; the second module is configured to perform inter-node data integration on the head GPU of the current node and the last GPU of the previous adjacent node, and gather multi-node data obtained through integration into the head GPU of the current node and the last GPU of the previous adjacent node; the third module is configured to perform inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node, and gather multi-node data obtained through integration into the last GPU of the current node and the first GPU of the next adjacent node; and a fourth module configured to broadcast data in the first-order GPU of the current node and data in the last-order GPU of the current node to other GPUs inside the current node.

In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method.

In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.

The invention has at least the following beneficial technical effects: aiming at a specific multi-node GPU server, a new transmission method is expanded on the basis of a ring communication algorithm, and the problem of communication bandwidth waste among nodes is effectively avoided.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

FIG. 1 is a schematic diagram of a prior art ring communication algorithm;

FIG. 2 is a schematic diagram of a prior art Ring _ Allreduce algorithm;

FIG. 3 is a schematic diagram of an embodiment of a method for multi-node cluster ring communication provided in the present invention;

FIG. 4 is a GPU server node architecture diagram of the multi-node cluster ring communication method provided by the present invention;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for multi-node cluster ring communication provided in the present invention;

FIG. 6 is a schematic diagram of an embodiment of a computer device provided by the present invention;

FIG. 7 is a schematic diagram of an embodiment of a computer-readable storage medium provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it is understood that "first" and "second" are only used for convenience of expression and should not be construed as limitations to the embodiments of the present invention, and the descriptions thereof in the following embodiments are omitted.

In view of the foregoing, a first aspect of the embodiments of the present invention provides an embodiment of a method for multi-node cluster ring communication. Fig. 3 is a schematic diagram illustrating an embodiment of a method for multi-node cluster ring communication provided by the present invention. As shown in fig. 3, the embodiment of the present invention includes the following steps:

s01, performing intra-node data integration on all GPUs in the current node, and summarizing the single-node data obtained through integration into a first GPU and a last GPU;

s02, performing inter-node data integration on the head GPU of the current node and the last GPU of the previous adjacent node, and summarizing multi-node data obtained through integration into the head GPU of the current node and the last GPU of the previous adjacent node;

s03, performing inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node, and summarizing multi-node data obtained through integration into the last GPU of the current node and the first GPU of the next adjacent node; and

and S04, broadcasting and sending the data in the first GPU of the current node and the data in the last GPU of the current node to other GPUs inside the current node.

In the embodiment, based on a ring or tree communication method, multi-layer ring communication is established, the number of times of communication of the GPU between nodes is reduced, so as to improve the communication bandwidth, and the convenience of deep learning learners is considered to be integrated into nccl. FIG. 4 is a GPU server node architecture diagram of the multi-node cluster ring communication method provided by the invention, and as shown in FIG. 4, GPUs inside nodes are connected through NVSWITCH, the ideal transmission rate is 300GB/s, and the actual test rate is 250 GB/s. The server node is connected with other nodes through two IB cards, and the transmission of the two IB cards is 25GB/s in total.

In nccl current ring communication algorithm, limited by the connection of the IB card, the communication of ALL _ REDUCE is only measured to be 24 GB/s. Two rings are built inside the node, and the outside of the node is connected through IB cards, namely the GPU0 and 7 of the node 1 and the GPU0 and 7 of the node 2 are connected with each other to form two large rings. The communication complexity is 2(p-1) alpha +2n beta + n gamma- (2n beta + n gamma)/p, wherein p is the number of GPUs, alpha is the delay time of GPU transmission, beta is the transmission time of unit data, and gamma is the calculation time of reduce, and because gamma is very small and is ignored here for calculation convenience, and the influence of GPU delay is ignored when the transmission data volume is large, the communication complexity can be simply calculated as 2n beta (p-1)/p.

In this embodiment, with reference to fig. 4, the All _ reduce operation is performed in 16 GPUs between two nodes by using a ring communication method, and the All _ reduce operation is performed in 8 GPUs in the nodes. The nodes only carry out the All _ reduce operation through the GPU connected with the IB card, namely, the reduced data are concentrated on the GPU0 and the GPU 7.

In some embodiments of the invention, the method further comprises: dividing the data to be integrated into a plurality of preset data blocks, and calculating the sizes of the first data block and other data blocks based on the ratio of the inter-node data integration communication complexity to the intra-node data integration communication complexity.

In some embodiments of the invention, the sizes of the other data blocks are equal, and the ratio of the size of the first data block to the size of the other data blocks is equal to the ratio of the inter-node data integrity communication complexity to the intra-node data integrity communication complexity.

In this embodiment, data is divided into k +1 blocks, where the k blocks of data are equally divided, and the ratio of the size of the first block of data to the size of other data is the ratio of the communication complexity of the inter-node ring communication All _ reduce to the communication complexity of the intra-node ring communication All _ reduce, so as to keep the two steps of time consistent.

In some embodiments of the present invention, performing intra-node data integration on all GPUs inside the current node comprises: and connecting all GPUs in the current node through NVswitch, and performing intra-node data integration on all the GPUs.

In some embodiments of the present invention, performing inter-node data integration between a first GPU of a current node and a last GPU of a previous neighboring node includes: performing current data block integration between nodes on a first GPU of a current node and a last GPU of a previous adjacent node, and simultaneously performing intra-node next data block integration on all GPUs in the current node; performing inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node comprises the following steps: and performing current data block integration between nodes on the last GPU of the current node and the first GPU of the next adjacent node, and performing intra-node next data block integration on all GPUs in the current node.

In this embodiment, with reference to fig. 4, first ALL _ reduce transmission of the first data block is performed, in this process, to avoid waste of communication bandwidth in the node, the method may simultaneously operate on ALL _ reduce in the node, and after the end, the GPU0 and the GPU7 of the two nodes have integrated data. The complexity of this operation is 2n beta₁(p₁-1)/p₁Wherein beta is₁＝β/10、p₁8. And after finishing, performing all _ reduce operation between the nodes, so that the GPUs 0 and 7 of the two nodes obtain the data integrated by all the GPUs. The complexity of this process is 2n β (p/8-1)/(p/8). Broadcasting the data of the GPU0 and the GPU7, wherein the complexity of the process is n beta₁(p₁/2-1)/(p₁/2)) so that data block 2 completes the entire All _ reduce operation. And simultaneously starting All _ reduce operation among the nodes of the data block 3 and starting the next cycle.

In some embodiments of the present invention, performing inter-node data integration between a first GPU of a current node and a last GPU of a previous neighboring node includes: connecting a head GPU of the current node and a last GPU of the previous adjacent node through an IB card, and performing data integration between nodes on the head GPU of the current node and the last GPU of the previous adjacent node; performing inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node comprises the following steps: and connecting the last GPU of the current node and the first GPU of the next adjacent node through the IB card, and performing data integration between nodes on the last GPU of the current node and the first GPU of the next adjacent node.

In the embodiment, in the high-speed NVSWITCH network of A100 of two nodes, the acceleration effect can be up to 1.87 times, and no precision loss exists. By the multilayer annular communication method, the communication bandwidth of the multi-node GPU server is effectively improved, the problem that IB network communication is slow is solved to a certain extent, and a certain acceleration effect is achieved.

In some embodiments of the present invention, broadcasting the data in the first GPU of the current node and the data in the last GPU of the current node to other GPUs inside the current node includes: judging whether the data in the first GPU of the current node is the same as the data in the last GPU of the current node; if the data in the first GPU of the current node is the same as the data in the last GPU of the current node, broadcasting and sending the data in the first GPU of the current node to the first half parts of other GPUs inside the current node, and broadcasting and sending the data in the last GPU of the current node to the second half parts of other GPUs inside the current node; and if the data in the first GPU of the current node is different from the data in the last GPU of the current node, broadcasting and sending the data in the first GPU of the current node and the data in the last GPU of the current node to other GPUs inside the current node respectively.

In the present embodiment, with reference to fig. 4, taking two nodes as an example, at this time, the data in the GPU0 and the GPU7 are the same, and the data broadcast (broadcast) of the GPU0 and the GPU7 is sent out, the GPU0 is responsible for the GPUs nos. 0-3, and the GPU7 is responsible for the GPUs nos. 4-7.

It should be particularly noted that, the steps in the embodiments of the method for multi-node cluster ring communication described above may be mutually intersected, replaced, added, or deleted, and therefore, these methods for multi-node cluster ring communication with reasonable permutation and combination conversion shall also belong to the scope of the present invention, and shall not limit the scope of the present invention to the embodiments.

In view of the above object, a second aspect of the embodiments of the present invention provides an apparatus for multi-node cluster ring communication. Fig. 5 is a schematic diagram illustrating an embodiment of an apparatus for multi-node cluster ring communication provided by the present invention. As shown in fig. 5, the embodiment of the present invention includes the following modules: the first module S11 is configured to perform intra-node data integration on all GPUs inside a current node, and summarize single-node data obtained by the integration into a first GPU and a last GPU; a second module S12, configured to perform inter-node data integration on the head GPU of the current node and the last GPU of the previous adjacent node, and summarize multi-node data obtained by the integration into the head GPU of the current node and the last GPU of the previous adjacent node; a third module S13, configured to perform inter-node data integration on the last GPU of the current node and the first GPU of the next adjacent node, and aggregate multi-node data obtained by the integration into the last GPU of the current node and the first GPU of the next adjacent node; and a fourth module S14, configured to broadcast the data in the first GPU of the current node and the data in the last GPU of the current node to other GPUs inside the current node.

In view of the above object, a third aspect of the embodiments of the present invention provides a computer device. Fig. 6 is a schematic diagram of an embodiment of a computer device provided by the present invention. As shown in fig. 6, the embodiment of the present invention includes the following means: at least one processor S21; and a memory S22, the memory S22 storing computer instructions S23 executable on the processor, the instructions when executed by the processor implementing the steps of the above method.

The invention also provides a computer readable storage medium. FIG. 7 is a schematic diagram illustrating an embodiment of a computer-readable storage medium provided by the present invention. As shown in fig. 7, the computer readable storage medium S31 stores a computer program S32 which, when executed by a processor, performs the method as described above.

Finally, it should be noted that, as one of ordinary skill in the art can appreciate that all or part of the processes of the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program of the method for multi-node cluster ring communication can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods as described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.

Furthermore, the methods disclosed according to embodiments of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. Which when executed by a processor performs the above-described functions defined in the methods disclosed in embodiments of the invention.

Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

14页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种多路CPU互联系统

Method, device and equipment for multi-node cluster ring communication and readable medium

相关技术

网友询问留言