Lumped RDMA link management method

文档序号:649761 发布日期:2021-05-14 浏览:13次 中文

阅读说明:本技术 一种集总式rdma链接管理的方法 (Lumped RDMA link management method ) 是由 朱珂 王伟岐 王盼 林谦 王永胜 徐庆阳 王晓雪 姜海斌 夏云飞 袁婉甄 于 2021-03-02 设计创作,主要内容包括:本发明公开了一种集总式RDMA链接管理的方法,所述任务接受模块获取内核空间任务指示,发送请求信号至集总链接管理模块;所述任务完成模块根据自身工作状态发送请求信号至集总链接管理模块;所述集总链接管理模块将信息合理下发位域至各个子模块;所述发送组包模块空闲时发送请求信号至集总链接管理模块;所述接收解包模块根据接收包发送请求信号及对应链接序号至集总链接管理模块;所述片内缓存模块根据自身空闲及满、空状态发送请求信号收容不常用的Q序列。通过集总式链接管理模块的使用,实现了各子模块之间的信号隔离,降低了RDMA引擎耦合性,以及死锁等情况发生概率。(The invention discloses a method for managing lumped RDMA links, wherein a task receiving module acquires a kernel space task instruction and sends a request signal to a lumped link management module; the task completion module sends a request signal to the lumped link management module according to the working state of the task completion module; the lumped link management module reasonably sends information to each submodule through a transmission domain; the sending group package module sends a request signal to the lumped link management module when idle; the receiving and unpacking module sends a request signal and a corresponding link serial number to the lumped link management module according to a receiving packet; and the on-chip cache module sends a request signal to accommodate the unusual Q sequence according to the idle, full and empty states of the on-chip cache module. Through the use of the lumped link management module, signal isolation among the submodules is realized, and the RDMA engine coupling, deadlock and other conditions are reduced.)

1. A method of lumped RDMA link management, comprising: the system comprises a task receiving module, a task completing module, a lumped link management module, a sending package module, a receiving unpacking module and an in-chip cache module;

the task receiving module acquires a kernel space task instruction and sends a request signal to the lumped link management module;

the task completion module sends a request signal to the lumped link management module according to the working state of the task completion module;

the lumped link management module reasonably sends information to each submodule through a transmission domain;

the sending group package module sends a request signal to the lumped link management module when idle;

the receiving and unpacking module sends a request signal and a corresponding link serial number to the lumped link management module according to a receiving packet;

and the on-chip cache module sends a request signal to accommodate the unusual Q sequence according to the idle, full and empty states of the on-chip cache module.

2. The method of claim 1, wherein when the task accepting module obtains a kernel-space task indication and sends a request signal to the lumped link management module, if the lumped link management module has a free bit field and the L, W bit field is not locked, the task accepting module is replied to specify the bit field segment, and the L bit is locked to prohibit other sub-modules from using the bit field; the task receiving module fills the bit field according to the task characteristics obtained from the kernel space, wherein the bit field comprises a serial number Q, a link register Registers and a packet processing Context; after the lumped link management module receives the bit field, the L bit is unlocked, and other sub-modules are accessed.

3. The method of claim 1, wherein when the task completion module sends a request signal to the lumped link management module according to its working status, the lumped link management module replies the relevant information of the task completion module, including the task completion status and exception information, according to the L, C bit field; the task completion module integrates and reports the relevant bit field information to the kernel space, and if the interrupt is enabled, the interrupt is triggered; and after receiving the completion reply, the lumped link management module empties the bit domain of the corresponding link sequence and releases the bit domain resource.

4. The method of claim 1, wherein the lumped RDMA link management module needs to reasonably assign a bit field to each sub-module according to the request signal of each sub-module and according to information such as L, Q, W, C, T, R, Registers, etc. to achieve ordered management of RDMA links.

5. The method of claim 1, wherein when sending request signals to the lumped link management module when the sending group module is idle, if there is an idle bit field in the lumped link management module and the L, T bit field is not locked, then the reply task accepting module assigns a bit field segment and locks the L bit, prohibiting other sub-modules from using the bit field; the sending package module extracts effective information according to Q, Register and Context, and after the verification is passed, the sending package module takes out data to be sent from a user space and packages and sends the data; after the lumped link management module receives the return write bit field, the L bit is unlocked, and other sub-modules are accessed.

6. The method for centralized RDMA link management of any of claims 1-5, wherein the receive unpack module sends request signals and corresponding link sequence numbers to the centralized link management module based on the receive packets; if the lumped link management module has a bit field distributed by Q and the same as the link serial number, sending corresponding bit field information to a receiving unpacking module; the unpacking receiving module is used for transmitting the corresponding transmission data to the user space after the corresponding information is verified, and writing the corresponding transmission data into the lumped link management module; and after the link management module receives the returned write bit field, unlocking the L bit, updating other sub-modules of the Context to access, and setting the C bit field if the last packet of data is determined.

7. The method for lumped RDMA link management according to any of claims 1-6, characterized in that the on-chip cache module sends request signals to accommodate unusual Q sequences according to its idle and full, empty state; the link management module determines whether to temporarily store each bit field into the SRAM according to the busy and free bit fields and the locking bit field L of the bit field; if the bit field is not frequently used, the bit field is accommodated in the cache and replaced by a new bit field, so that fair and efficient scheduling of a large number of links is realized.

Technical Field

The invention relates to the technical field of communication, in particular to a method for managing lumped RDMA links.

Background

RDMA (remote Direct Memory access) means remote Direct address access, and a corresponding engine directly carries out data transfer on a user space in a local Memory and a user space in an opposite-end Memory after acquiring a local or opposite-end device instruction and passing analysis and verification of the local or opposite-end device instruction. In the process, data copying of a kernel and a user space is not needed, so that the bandwidth flux is improved, the time delay is reduced, the CPU load is reduced, and the method is suitable for cluster calculation sensitive to bandwidth and time delay.

In the RDMA data moving process, the engine needs to perform effective management on states such as link initialization, packet switching limitation, flight time, reply response, completion of exception reporting and the like, so as to obtain stable and efficient data transmission. The complexity of the protocol and the interference of the external environment cause the design of the engine to need to consider more situations, so that the internal design of the RDMA engine is more complex.

At present, management of different states in a working process is realized by dividing submodules with different functions in an RDMA engine, and direct signal interaction usually exists in each special submodule. If the engine has a sending module and a receiving module at the same time, when the receiving module receives the abnormal message and needs to inform the RDMA engine at the opposite end of resending, the receiving module needs to directly inform the sending module of generating a corresponding request message; and the task receiving module acquires a new task descriptor from the memory and then issues the new task descriptor to the sending module. At this time, the sending module will process a plurality of signal sources, and put higher requirements on the design of the module itself.

The distributed RDMA link management method can improve the response rate of the system to a certain extent, but the distributed management method brings higher coupling of the system, is easy to form semaphore dependence and has deadlock risk. The number of links is limited, and the concurrency degree is low; the interaction inside the engine is too much, and the expansibility is greatly limited.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a method for managing lumped RDMA links, which solves the problems in the background technology.

The invention provides the following technical scheme:

a method of lumped RDMA link management, comprising: the system comprises a task receiving module, a task completing module, a lumped link management module, a sending package module, a receiving unpacking module and an in-chip cache module;

the task receiving module acquires a kernel space task instruction and sends a request signal to the lumped link management module;

the task completion module sends a request signal to the lumped link management module according to the working state of the task completion module;

the lumped link management module reasonably sends information to each submodule through a transmission domain;

the sending group package module sends a request signal to the lumped link management module when idle;

the receiving and unpacking module sends a request signal and a corresponding link serial number to the lumped link management module according to a receiving packet;

and the on-chip cache module sends a request signal to accommodate the unusual Q sequence according to the idle, full and empty states of the on-chip cache module.

Preferably, when the task receiving module obtains a kernel space task instruction and sends a request signal to the lumped link management module, if the lumped link management module has an idle bit domain and the L, W bit domain segments are not locked, replying to the assigned bit domain segment of the task receiving module, and locking the L bit at the same time to prohibit other sub-modules from utilizing the bit domain; the task receiving module fills the bit field according to the task characteristics obtained from the kernel space, wherein the bit field comprises a serial number Q, a link register Registers and a packet processing Context; after the lumped link management module receives the bit field, the L bit is unlocked, and other sub-modules are accessed.

Preferably, when the task completion module sends a request signal to the lumped link management module according to the working state of the task completion module, the lumped link management module replies the relevant information of the task completion module, including the task completion state and the abnormal information, according to the L and C bit fields; the task completion module integrates and reports the relevant bit field information to the kernel space, and if the interrupt is enabled, the interrupt is triggered; and after receiving the completion reply, the lumped link management module empties the bit domain of the corresponding link sequence and releases the bit domain resource.

Preferably, the lumped link management module needs to reasonably send a bit field to each sub-module according to the request signal of each sub-module and according to information such as L, Q, W, C, T, R, Registers and the like, so as to realize ordered management of RDMA links.

Preferably, when the sending group package module sends a request signal to the lumped link management module when the sending group package module is idle, if the lumped link management module has an idle bit field and the L, T bit field is not locked, the reply task receiving module assigns a bit field segment, and locks the L bit at the same time, and prohibits other sub-modules from utilizing the bit field; the sending package module extracts effective information according to Q, Register and Context, and after the verification is passed, the sending package module takes out data to be sent from a user space and packages and sends the data; after the lumped link management module receives the return write bit field, the L bit is unlocked, and other sub-modules are accessed.

Preferably, the receiving and unpacking module sends the request signal and the corresponding link serial number to the collective link management module according to the received packet; if the lumped link management module has a bit field distributed by Q and the same as the link serial number, sending corresponding bit field information to a receiving unpacking module; the unpacking receiving module is used for transmitting the corresponding transmission data to the user space after the corresponding information is verified, and writing the corresponding transmission data into the lumped link management module; and after the link management module receives the returned write bit field, unlocking the L bit, updating other sub-modules of the Context to access, and setting the C bit field if the last packet of data is determined.

Preferably, the on-chip cache module sends a request signal to accommodate the unusual Q sequence according to the idle, full and empty states of the on-chip cache module; the link management module determines whether to temporarily store each bit field into the SRAM according to the busy and free bit fields and the locking bit field L of the bit field; if the bit field is not frequently used, the bit field is accommodated in the cache and replaced by a new bit field, so that fair and efficient scheduling of a large number of links is realized.

Preferably, a sending and receiving submodule group is added in the RDMA engine to improve the system bandwidth, or a submodule is added and the task processing link is changed in the same handshaking mode.

Preferably, an additional bit field is added in the aggregated link management module to record the buffer condition of the link resource

Preferably, the on-chip cache module uses a plurality of groups of connection modes to improve the search efficiency.

Preferably, the on-chip cache module is further connected with an off-chip high-capacity buffer module to increase the cache amount of the RDMA engine.

Compared with the prior art, the invention has the following beneficial effects:

(1) the invention discloses a method for managing lumped RDMA (remote direct memory Access) links, which realizes signal isolation among submodules by using a lumped link management module, and reduces the coupling of an RDMA engine and the probability of deadlock.

(2) The invention provides a method for managing lumped RDMA links, wherein a module of a lumped RDMA link management module can run under a relatively high clock under the condition of less bit fields, and effective bit field information can be obtained in a short time after a submodule sends a request, so that the disadvantage of low response rate caused by a non-distributed management method is compensated to a certain extent.

(3) The invention discloses a method for managing lumped RDMA links, which is used for constructing a multi-level storage system, so that link pairs are periodically and temporarily stored in an SRAM and an SDRAM, the flight time of a data packet is fully utilized, and the high-concurrency design is simpler and more efficient.

(4) The invention is a method for managing the lumped RDMA link, which improves the utilization rate of the storage resource, reduces the use of the register and improves the running speed by configuring the bit field randomly distributed and mapping the address signal of the state register into the link serial number.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is an overall block diagram of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described in detail and completely with reference to the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Example one

Referring to FIG. 1, to highlight the nature of the method, the interrupt and register allocation submodule is not shown in FIG. 1. A method of lumped RDMA link management, comprising: the system comprises a task receiving module, a task completing module, a lumped link management module, a sending package module, a receiving unpacking module and an in-chip cache module; the task receiving module acquires a kernel space task instruction and sends a request signal to the lumped link management module; the task completion module sends a request signal to the lumped link management module according to the working state of the task completion module; the lumped link management module reasonably sends information to each submodule through a transmission domain; the sending group package module sends a request signal to the lumped link management module when idle; the receiving and unpacking module sends a request signal and a corresponding link serial number to the lumped link management module according to a receiving packet; and the on-chip cache module sends a request signal to accommodate the unusual Q sequence according to the idle, full and empty states of the on-chip cache module.

When the task receiving module obtains a kernel space task instruction and sends a request signal to the lumped link management module, if the lumped link management module has an idle bit domain and the L and W bit domain sections are not locked, replying the assigned bit domain section of the task receiving module, locking the L bit, and forbidding other sub-modules to utilize the bit domain; the task receiving module fills the bit field according to the task characteristics obtained from the kernel space, wherein the bit field comprises a serial number Q, a link register Registers and a packet processing Context; after the lumped link management module receives the bit field, the L bit is unlocked, and other sub-modules are accessed.

When the task completion module sends a request signal to the lumped link management module according to the working state of the task completion module, the lumped link management module replies the relevant information of the task completion module, including the task completion state and the abnormal information, according to the L and C bit domains; the task completion module integrates and reports the relevant bit field information to the kernel space, and if the interrupt is enabled, the interrupt is triggered; and after receiving the completion reply, the lumped link management module empties the bit domain of the corresponding link sequence and releases the bit domain resource.

The lumped link management module needs to reasonably send a bit field to each submodule according to a request signal of each submodule and according to information such as L, Q, W, C, T, R, Registers and the like, so that the RDMA link is orderly managed. By means of centralized management of the link resources, a hierarchical storage system is constructed, register resources are converted into SRAM resources and SDRAM resources, the number of connection management is increased, and fairness and timeliness of communication among different endpoints are balanced.

When the sending group module sends a request signal to the lumped link management module when the sending group module is idle, if the lumped link management module has an idle bit domain and the L and T bit domain sections are not locked, the task receiving module is replied to assign the bit domain section, and the L bit is locked at the same time, and other sub-modules are forbidden to use the bit domain; the sending package module extracts effective information according to Q, Register and Context, and after the verification is passed, the sending package module takes out data to be sent from a user space and packages and sends the data; after the lumped link management module receives the return write bit field, the L bit is unlocked, and other sub-modules are accessed.

The receiving and unpacking module sends a request signal and a corresponding link serial number to the lumped link management module according to a receiving packet; if the lumped link management module has a bit field distributed by Q and the same as the link serial number, sending corresponding bit field information to a receiving unpacking module; the unpacking receiving module is used for transmitting the corresponding transmission data to the user space after the corresponding information is verified, and writing the corresponding transmission data into the lumped link management module; and after the link management module receives the returned write bit field, unlocking the L bit, updating other sub-modules of the Context to access, and setting the C bit field if the last packet of data is determined. The on-chip cache module sends a request signal to accommodate the Q sequence which is not commonly used according to the idle state, the full state and the empty state of the on-chip cache module; the link management module determines whether to temporarily store each bit field into the SRAM according to the busy and free bit fields and the locking bit field L of the bit field; if the bit field is not frequently used, the bit field is accommodated in the cache and replaced by a new bit field, so that fair and efficient scheduling of a large number of links is realized.

The invention establishes a context mechanism of communication by defining the link communication state between endpoints as a specific domain section, and increases and decreases the concurrency degree and expansibility of communication; by setting a locking mechanism in the domain section, the uniqueness management of resources is realized, the conflict in the receiving and sending processing process is avoided, and the deadlock risk is reduced; through a handshake mechanism between the lumped link management module and different sub-modules in an RDMA protocol, direct signal interaction between the different modules is avoided, each sub-module can run at different rates, states (data) and behaviors (algorithms) are separated, a system is decoupled, and stability is improved; the configuration/status register resources are transferred into SRAM and SDRAM resources, and the index address is mapped into a link serial number according to a specific rule, so that the storage resources are randomly distributed, and the resource overhead of the register is reduced.

Example two

A method of lumped RDMA link management, comprising: the system comprises a task receiving module, a task completing module, a lumped link management module, a sending package module, a receiving unpacking module and an in-chip cache module; the task receiving module acquires a kernel space task instruction and sends a request signal to the lumped link management module; the task completion module sends a request signal to the lumped link management module according to the working state of the task completion module; the lumped link management module reasonably sends information to each submodule through a transmission domain; the sending group package module sends a request signal to the lumped link management module when idle; the receiving and unpacking module sends a request signal and a corresponding link serial number to the lumped link management module according to a receiving packet; and the on-chip cache module sends a request signal to accommodate the unusual Q sequence according to the idle, full and empty states of the on-chip cache module.

When the task receiving module obtains a kernel space task instruction and sends a request signal to the lumped link management module, if the lumped link management module has an idle bit domain and the L and W bit domain sections are not locked, replying the assigned bit domain section of the task receiving module, locking the L bit, and forbidding other sub-modules to utilize the bit domain; the task receiving module fills the bit field according to the task characteristics obtained from the kernel space, wherein the bit field comprises a serial number Q, a link register Registers and a packet processing Context; after the lumped link management module receives the bit field, the L bit is unlocked, and other sub-modules are accessed.

When the task completion module sends a request signal to the lumped link management module according to the working state of the task completion module, the lumped link management module replies the relevant information of the task completion module, including the task completion state and the abnormal information, according to the L and C bit domains; the task completion module integrates and reports the relevant bit field information to the kernel space, and if the interrupt is enabled, the interrupt is triggered; and after receiving the completion reply, the lumped link management module empties the bit domain of the corresponding link sequence and releases the bit domain resource.

The lumped link management module needs to reasonably send a bit field to each submodule according to a request signal of each submodule and according to information such as L, Q, W, C, T, R, Registers and the like, so that the RDMA link is orderly managed. By means of centralized management of the link resources, a hierarchical storage system is constructed, register resources are converted into SRAM resources and SDRAM resources, the number of connection management is increased, and fairness and timeliness of communication among different endpoints are balanced.

When the sending group module sends a request signal to the lumped link management module when the sending group module is idle, if the lumped link management module has an idle bit domain and the L and T bit domain sections are not locked, the task receiving module is replied to assign the bit domain section, and the L bit is locked at the same time, and other sub-modules are forbidden to use the bit domain; the sending package module extracts effective information according to Q, Register and Context, and after the verification is passed, the sending package module takes out data to be sent from a user space and packages and sends the data; after the lumped link management module receives the return write bit field, the L bit is unlocked, and other sub-modules are accessed.

The receiving and unpacking module sends a request signal and a corresponding link serial number to the lumped link management module according to a receiving packet; if the lumped link management module has a bit field distributed by Q and the same as the link serial number, sending corresponding bit field information to a receiving unpacking module; the unpacking receiving module is used for transmitting the corresponding transmission data to the user space after the corresponding information is verified, and writing the corresponding transmission data into the lumped link management module; and after the link management module receives the returned write bit field, unlocking the L bit, updating other sub-modules of the Context to access, and setting the C bit field if the last packet of data is determined. The on-chip cache module sends a request signal to accommodate the Q sequence which is not commonly used according to the idle state, the full state and the empty state of the on-chip cache module; the link management module determines whether to temporarily store each bit field into the SRAM according to the busy and free bit fields and the locking bit field L of the bit field; if the bit field is not frequently used, the bit field is accommodated in the cache and replaced by a new bit field, so that fair and efficient scheduling of a large number of links is realized.

The invention establishes a context mechanism of communication by defining the link communication state between endpoints as a specific domain section, and increases and decreases the concurrency degree and expansibility of communication; by setting a locking mechanism in the domain section, the uniqueness management of resources is realized, the conflict in the receiving and sending processing process is avoided, and the deadlock risk is reduced; through a handshake mechanism between the lumped link management module and different sub-modules in an RDMA protocol, direct signal interaction between the different modules is avoided, each sub-module can run at different rates, states (data) and behaviors (algorithms) are separated, a system is decoupled, and stability is improved; the configuration/status register resources are transferred into SRAM and SDRAM resources, and the index address is mapped into a link serial number according to a specific rule, so that the storage resources are randomly distributed, and the resource overhead of the register is reduced.

And a sending and receiving submodule group is added in the RDMA engine to improve the system bandwidth, or submodules are added and the task processing link is changed by using the same handshaking mode. And an extra bit field is added in the lumped link management module to record the buffer condition of the link resources. The on-chip cache module improves the searching efficiency by using a plurality of groups of connection modes. The on-chip cache module is also connected with an off-chip high-capacity buffer module to increase the cache amount of the RDMA engine.

The lumped RDMA link resource management reduces the coupling among the submodules; the multi-level storage design improves the concurrency of link communication; the use of the bit field segment with the lock and the handshake signals improves the stability of the RDMA engine; the address signals of the configuration status register are mapped into the link serial number and are randomly distributed, so that the occupation of storage resources is reduced.

The invention discloses a method for managing lumped RDMA links, which can fully decouple modules in an engine without increasing time delay remarkably, thereby improving the stability and the feasibility of the engine.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于FPGA的字节对齐方法、装置、设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!