Method and equipment for processing I/O (input/output) request

文档序号:1056785 发布日期:2020-10-13 浏览:13次 中文

阅读说明:本技术 处理i/o请求的方法及设备 (Method and equipment for processing I/O (input/output) request ) 是由 李浪波 张明谦 于 2018-07-17 设计创作,主要内容包括:本发明实施例提供一种处理I/O请求的方法及设备。所述主机与存储系统通过NVMeoF协议进行通信,所述存储系统包括逻辑磁盘,所述主机通过所述存储系统中的控制节点访问所述逻辑磁盘。所述方法包括:发送状态查询命令至所述控制节点,所述状态查询命令用于指示所述控制节点上报所述控制节点所在路径的路径状态;接收所述控制节点上报的路径状态;当接收到的路径状态指示所述逻辑磁盘包括访问区间时,发送区间查询命令至所述控制节点;接收所述控制节点上报的访问区间信息,所述访问区间信息所指示的访问区间被预先分配给所述控制节点;记录所述控制节点与所述访问区间信息的映射关系。(The embodiment of the invention provides a method and equipment for processing an I/O request. The host and the storage system communicate through an NVMeOF protocol, the storage system comprises a logical disk, and the host accesses the logical disk through a control node in the storage system. The method comprises the following steps: sending a state query command to the control node, wherein the state query command is used for indicating the control node to report the path state of the path where the control node is located; receiving the path state reported by the control node; when the received path state indicates that the logic disk comprises an access interval, sending an interval query command to the control node; receiving access interval information reported by the control node, wherein the access interval indicated by the access interval information is pre-allocated to the control node; and recording the mapping relation between the control node and the access interval information.)

1. A method of data processing, comprising:

the method comprises the steps that a host sends an interval query command to storage equipment, wherein the query command is used for querying a logical address access interval of a logical disk in the storage equipment;

and the host receives the logical address access interval information reported by the storage equipment.

2. The method of claim 1, wherein the logical address access interval information comprises description information of the logical address access interval, the description information describing an address space of the logical address access interval.

3. The method of claim 1 or 2, wherein the logical address access interval information comprises a number of logical address access intervals included by the logical disk.

4. The method of claim 2, wherein the address space of the logical address access intervals is a contiguous address space.

5. The method of claim 4, wherein the description information includes an access interval head address and an access interval length.

6. The method of claim 2, wherein the logical address access interval comprises a first sub-access interval and a second sub-access interval, the first sub-access interval and the second sub-access interval having a gap therebetween.

7. The method of claim 6, wherein the first sub-access sub-interval and the second access sub-interval are adjacent, and the description information includes an access interval head address, a length of each sub-access interval, and an interval between two adjacent sub-access intervals.

8. The method of any one of claims 1-7, wherein the host communicates with a storage device over an external network, and wherein sending an interval query command to the storage device comprises:

packaging the interval query command to the external network protocol to obtain an external network protocol interval query command;

and sending the external network protocol interval query command to the storage device.

9. The method of claim 8, wherein the receiving the logical address access interval information reported by the storage device comprises:

receiving an external network protocol interval response message sent by the storage device, wherein the external network protocol interval response message comprises an interval query command response, and the interval query command response comprises the logic address access interval information;

analyzing the external network protocol interval response message to obtain the interval query command response;

and responding to the interval query command to acquire the logic address access interval information.

10. A method of data processing, comprising:

the storage equipment receives an access interval query command sent by the host, wherein the query command is used for querying a logical address access interval of a logical disk in the storage equipment, and the storage equipment supports a non-volatile memory standard (NVMe) protocol;

and the storage equipment reports the logic address access interval information of the logic disk to the host according to the interval query command.

11. The method of claim 10, wherein the logical address access interval information comprises description information of the logical address access interval, the description information describing an address space of the logical address access interval.

12. The method of claim 10 or 11, wherein the logical address access interval information comprises a number of logical address access intervals included by the logical disk.

13. The method of claim 11, wherein the logical address access intervals are a contiguous address space.

14. The method of claim 13, wherein the description information includes an access interval header address and an access interval length.

15. The method of claim 11, wherein the logical address access interval comprises a first sub-access interval and a second sub-access interval, the first sub-access interval and the second sub-access interval having a gap therebetween.

16. The method of claim 15, wherein the first sub-access sub-interval and the second access sub-interval are adjacent, and the description information includes an access interval head address, a length of each sub-access interval, and an interval between two adjacent sub-access intervals.

17. The storage device according to any one of claims 10 to 16, wherein the host and the storage device communicate via an external network, and reporting the logical address access interval information of the logical disk to the host according to the interval query command comprises:

packaging the logic address access interval information to the external network protocol to obtain external network protocol reporting information;

and sending the external network protocol report information to the host.

18. A host, comprising:

the device comprises an interval query module, a storage device and a processing module, wherein the interval query module is used for sending an interval query command to the storage device, and the query command is used for querying a logical address access interval of a logical disk in the storage device;

and the receiving module is used for receiving the logic address access interval information reported by the storage equipment.

19. The host of claim 18, wherein the logical address access interval information includes description information of an access interval of the logical address access interval, the description information describing an address space of the logical address access interval.

20. The host of claim 18 or 19, wherein the logical address access interval information comprises a number of logical address access intervals included by the logical disk.

21. The host of claim 19, wherein the logical address access intervals are a contiguous address space.

22. The host of claim 21, wherein the description information includes an access interval head address and an access interval length.

23. The host of claim 19, wherein the logical address access interval comprises a first sub-access interval and a second sub-access interval, with a gap between the first sub-access interval and the second sub-access interval.

24. The host of claim 23, wherein the first sub-access sub-interval and the second access sub-interval are adjacent, and the description information includes an access interval head address, a length of each sub-access interval, and a space between two adjacent sub-access intervals.

25. The host according to any one of claims 18 to 24, wherein the host communicates with the storage device via an external network, and the interval query module is specifically configured to:

packaging the interval query command to the external network protocol to obtain an external network protocol interval query command;

and sending the external network protocol interval query command to the storage device.

26. The host of claim 25, wherein the receiving module is specifically configured to:

receiving an external network protocol interval response message sent by the storage device, wherein the external network protocol interval response message comprises an interval query command response, and the interval query command response comprises the logic address access interval information;

analyzing the external network protocol interval response message to obtain the interval query command response;

and responding to the interval query command to acquire the logic address access interval information.

27. A storage device, the storage device comprising:

a receiving module, configured to receive an access interval query command sent by the host, where the query command is used to query a logical address access interval of a logical disk in a storage device;

and the partition reporting module is used for reporting the logic address access interval information of the logic disk to the host according to the interval query command.

28. The memory device of claim 27, wherein the logical address access interval information includes description information of an access interval of the logical address access interval, the description information describing an address space of the logical address access interval.

29. The storage device of claim 27 or 28, wherein the logical address access interval information comprises a number of logical address access intervals included by the logical disk.

30. The memory device of claim 28, wherein the logical address access intervals are a contiguous address space.

31. The storage device of claim 20, wherein the description information includes an access interval head address and an access interval length.

32. The memory device of claim 28, wherein the logical address access interval includes a first sub-access interval and a second sub-access interval, the first sub-access interval and the second sub-access interval having a gap therebetween.

33. The memory device of claim 32, wherein the first sub-access subinterval and the second access subinterval are contiguous, and the access interval description information includes an access interval header address, a length of each sub-access interval, and a separation between two contiguous sub-access intervals.

34. The storage device according to any one of claims 27 to 33, wherein the host and the storage device communicate via an external network, and the partition reporting module is specifically configured to:

packaging the access interval information to the external network protocol to obtain external network protocol reporting information;

and sending the external network protocol report information to the host.

Technical Field

The present application relates to the field of storage, and in particular, to a method and apparatus for processing an I/O request.

Background

In an existing storage architecture based on NVMe over Fabrics (NVMeoF) protocol, a storage system includes a plurality of control nodes and a logical disk (e.g., Namespace), and a host accesses one logical disk through the plurality of control nodes, so that a plurality of paths exist between the host and the logical disk. The host selects a path to send an I/O request to the storage system in a polling mode. And the control node which receives the I/O request in the storage system calculates and executes the control node of the I/O request according to a certain algorithm and the logic address carried by the I/O request. For example, the host issues an I/O request to a first control node through a first path of the multiple paths, and the first control node calculates, according to a preset algorithm, that the I/O request should be executed by a second control node for a logical address in the I/O request, and then forwards the I/O request to the second control node. The forwarding of I/O requests may result in increased I/O latency.

Disclosure of Invention

The embodiment of the invention provides a processing method, equipment and a host of an I/O request, which are used for setting an access interval for each control node of a storage system.

A first aspect of the embodiments of the present invention provides a data processing method executed by a host. The host and the storage system communicate through an NVMeOF protocol, the storage system comprises a logical disk, and the host accesses the logical disk through a control node in the storage system. The method comprises the following steps: and the host sends a state query command to the control node, wherein the state query command is used for indicating the control node to report the path state of the path where the control node is located, and then receiving the path state reported by the control node. When the path state received by the host indicates that the logical disk comprises an access interval, sending an interval query command to the control node, then receiving access interval information reported by the control node, and recording the mapping relation between the control node and the access interval information, wherein the access interval indicated by the access interval information is pre-allocated to the control node.

The access interval set by the storage system for each control node is obtained through the state query command of the NVMeOF protocol and the newly added interval query command, so that when an I/O request is subsequently received, the I/O request can be issued to the controller corresponding to the partition interval according to the partition interval in which the logical address of the I/O request falls, and the forwarding of the I/O is avoided.

In one possible design, the method further includes: receiving an I/O request, wherein the I/O request carries a logic address of data to be accessed; determining an access interval in which the logical address falls; determining a control node corresponding to the access interval according to the mapping relation; and sending the I/O request to a control node corresponding to the access interval.

Because the mapping relation between the control node and the access interval information is recorded in the host, when the I/O request is received, the I/O request can be issued to the controller corresponding to the partition interval according to the partition interval in which the logical address of the I/O request falls, so that the forwarding of the I/O is avoided.

In a possible design, when the control node determines that the logical disk includes the access interval, the path state indicating that the logical disk includes the access interval is reported to the host.

By adding the path state indicating the path state of the access interval included in the logical disk in the NVMe protocol, the host can judge whether to query the partition interval according to the path state.

In one possible design, the access interval query Command is defined based on a Command get Log Page-Command Dword 10 in the NVMeoF protocol, and carries a Command Identifier of the access interval query Command in a Log Page Identifier field of the Command.

And defining the interval query command by using a command defined in the existing NVMeOF protocol without changing the existing NVMeOF protocol.

In one possible design, the access intervals indicated by the access interval information are a continuous address space.

In one possible design, the access interval information includes an access interval first address and an access interval length.

In one possible design, the access interval corresponding to the control node includes a first sub-access interval and a second sub-access interval, and there is an interval between the first sub-access interval and the second sub-access interval.

In one possible design, the first sub-access sub-interval and the second access sub-interval are adjacent, and the access interval information includes an access interval head address, a length of each sub-access interval, and an interval between two adjacent sub-access intervals.

In one possible design, the access interval information is reported to the host through response information defined for the access interval query command, and the access interval information is carried in an interval description field of the response information.

A second aspect of the present invention provides a data processing method, which is performed by a control node of a storage system, where the storage system communicates with a host through an NVMeoF protocol, and the storage system includes a logical disk. The method comprises the steps of receiving a state query command sent by a host, wherein the state query command is used for indicating the control node to report the path state of the path where the control node is located; when the logic disk comprises an access interval, reporting a path state indicating that the logic disk comprises the access interval to a host; receiving an access interval query command sent by the host according to the path state; and reporting the access interval information of the access interval allocated to the control node to the host according to the interval query information, so that the host records the mapping relation between the access interval information and the control node.

The access interval set by the storage system for each control node is obtained through the state query command of the NVMeOF protocol and the newly added interval query command, so that when an I/O request is subsequently received, the I/O request can be issued to the controller corresponding to the partition interval according to the partition interval in which the logical address of the I/O request falls, and the forwarding of the I/O is avoided.

Various possible designs of the second aspect of the embodiment of the present invention are substantially the same as those of the first aspect of the present invention, and are not described herein again.

A third aspect of embodiments of the present invention provides a data processing method performed by a host, where the host is connected to a storage system, the storage system includes a plurality of control nodes, and the host accesses a logical disk in the storage system through the plurality of control nodes. The method comprises the following steps: receiving an I/O request, wherein the I/O request carries a logic address of data to be accessed; determining an access interval to which the logical address belongs, wherein the logical disk comprises a plurality of access intervals, and the host records a mapping relation between each access interval and a control node; and transmitting the I/O request to the control node corresponding to the determined access interval.

By setting an access interval for each control node of the storage system, when an I/O request is received, the I/O request can be issued to the controller corresponding to the partition interval according to the partition interval in which the logical address of the I/O request falls, so that the forwarding of the I/O is avoided.

In one possible design, the method further includes: sending a state query command to the plurality of control nodes, wherein the state query command is used for indicating the plurality of control nodes to report the path state of the path where the control node is located;

receiving the path states reported by the control nodes; when the received path state indicates that the logic disk comprises an access interval, sending an interval query command to a control node reporting the path state; receiving access interval information reported by the control node of the reported path state; and recording the mapping relation between the control node reporting the path state and the access interval information.

The access interval set by the storage system for each control node is obtained through the state query command of the NVMeOF protocol and the newly added interval query command, so that the host can issue the I/O request to the controller corresponding to the partition interval according to the partition interval in which the logical address of the I/O request falls, and the forwarding of the I/O is avoided.

In a possible design, when the control node that receives the status query command determines that the logical disk includes the access interval, the path status indicating that the logical disk includes the access interval is reported to the host by the control node that receives the status query command.

By adding the path state indicating the path state of the access interval included in the logical disk in the NVMe protocol, the host can judge whether to query the partition interval according to the path state.

Several other possible designs of embodiments of the present invention are the same as those provided in the first aspect, and are not described herein again.

A fourth aspect of the embodiments of the present invention provides a data processing method performed by a host, where the host communicates with a storage system via an external network, the method including: encapsulating a nonvolatile memory standard NVMe interval query command to an external network protocol to obtain an external network protocol interval query command, wherein the NVMe interval query command is used for querying an access interval allocated to a controller of the storage system, and the access interval belongs to a namespace of the storage system; sending the external network protocol interval query command to the control node; receiving an external network protocol interval response message sent by the control node, wherein the external network protocol interval response message comprises an NVMe interval query command response, and the NVMe interval query command response comprises access interval information of the namespace; analyzing the response message of the external network protocol interval to obtain response information of the NVMe interval query command; and acquiring and recording the access interval information of the control node from the NVMe interval query command response.

The access interval set by the storage system for each control node is obtained through the newly added interval query command of the NVMeOF protocol, so that when an I/O request is subsequently received, the I/O request can be issued to the controller corresponding to the partition interval according to the partition interval in which the logic address of the I/O request falls, and the forwarding of I/O is avoided.

In one possible implementation manner, before the encapsulating the non-volatile memory standard NVMe interval query command to the external network protocol obtains an external network protocol interval query command, the method further includes:

encapsulating an NVMe state query command to the external network protocol to obtain an external network protocol state query command, wherein the NVMe state query command is used for querying the path state of a path where the control node is located;

sending the external network protocol state query command to the control node;

receiving an external network protocol status response message sent by the control node, wherein the external network protocol status response message comprises an NVMe status query command response, the NVMe status query command response comprises path status information, and the path status information indicates that the namespace comprises an access interval;

and analyzing the external network protocol state response message to obtain the NVMe state query command response message.

Through the NVMe path state query command, the indicated path state information can be reported to the host, and the host can determine whether the logical disk divides the access interval according to the path state information.

Other possible implementation manners of the embodiment of the present invention are the same as various possible implementation manners of the first aspect, and are not described herein again.

A fifth aspect of an embodiment of the present invention provides a data processing method, which is executed by a control node of a storage system, where a host and the storage system communicate with each other through an external network, and the method includes: receiving an external network protocol interval query command sent by a host, and analyzing the external network protocol interval query command to obtain an NVMe interval query command; generating response information of the NVMe interval query command, wherein the response information of the NVMe interval query command comprises access interval information corresponding to the control node, and packaging the NVMe interval query command to the external network protocol to obtain an external network protocol interval query command; and reporting the external network protocol interval query command to the host.

The access interval set by the storage system for each control node is obtained through the newly added interval query command of the NVMeOF protocol, so that when an I/O request is subsequently received, the I/O request can be issued to the controller corresponding to the partition interval according to the partition interval in which the logic address of the I/O request falls, and the forwarding of I/O is avoided.

In one possible design, before receiving the external network protocol interval query command sent by the host, the method further includes: receiving an external network protocol state query command, and analyzing the external network protocol state query command to obtain a nonvolatile memory standard NVMe state query command, where the NVMe state query command is used to instruct a control node to report a path state of a path where the control node is located; generating response information of the NVMe status query command, and carrying a path state indicating that the namespace includes an access interval in the response information of the NVMe status query command when the namespace includes the access interval; and encapsulating the NVMe state query command to obtain response information of the external network protocol state query command, and reporting the response information of the external network protocol state query command to the host.

Through the NVMe path state query command, the indicated path state information can be reported to the host, and the host can determine whether the logical disk divides the access interval according to the path state information.

Other possible implementation manners of the embodiment of the present invention are the same as various possible implementation manners of the first aspect, and are not described herein again.

In a sixth aspect, an embodiment of the present invention further provides a host, where the host communicates with a storage system through an NVMeoF protocol, the storage system includes a logical disk, and the host accesses the logical disk through a control node in the storage system, and the host further includes a unit or a means for performing each step in the above first aspect.

In a seventh aspect, an embodiment of the present invention further provides a control node of a storage system, where a host communicates with the storage system through an NVMeoF protocol, the storage system includes a logical disk, the host accesses the logical disk through the control node in the storage system, and the control node further includes a unit or a means for performing each step in the second aspect.

In an eighth aspect, an embodiment of the present invention further provides a host, where the host communicates with a storage system through an NVMeoF protocol, the storage system includes a logical disk, and the host accesses the logical disk through a control node in the storage system, and the host includes a unit or a means for performing each step in the third aspect.

In a ninth aspect, embodiments of the present invention further provide a host computer, where the host computer communicates with a storage system through an external network, and the host computer includes a unit or means for performing the steps of the fourth aspect.

In a tenth aspect, embodiments of the present invention further provide a control node of a storage system, where a host communicates with the storage system through an external network, and the control node includes a unit or a means for performing the steps of the above fifth aspect.

In an eleventh aspect, embodiments of the present invention further provide a host, where the host communicates with a storage system through an NVMeoF protocol, the storage system includes a logical disk, and the host accesses the logical disk through a control node in the storage system, and the host includes a memory and a processor, where the memory is used for storing programs and data, and the processor is used for running the programs stored in the memory, and according to the data stored in the memory, the various methods provided in the first aspect, the third aspect, or the fourth aspect above are performed.

In a twelfth aspect, an embodiment of the present invention further provides a control node of a storage system, where a host communicates with the storage system through an NVMeoF protocol, the storage system includes a logical disk, the host accesses the logical disk through the control node in the storage system, and the control node includes a memory and a processor, where the memory is used to store programs and data, and the processor is used to run the programs stored by the memory, and execute the methods provided in the second aspect or the fifth aspect according to the data stored by the memory.

Drawings

Fig. 1 is an architecture diagram of a system to which an embodiment of the present invention is applied.

FIG. 2 is a diagram illustrating N paths for a host to access a logical disk of a storage system according to an embodiment of the present invention.

Fig. 3a-3b are schematic diagrams illustrating the address space of a logical disk in the storage system is divided into N continuous address spaces and N discontinuous address spaces according to an embodiment of the present invention.

Fig. 4 is an architecture diagram of a host in an embodiment of the invention.

Fig. 5 is a flowchart of a method for managing multiple paths for accessing the logical disk according to an embodiment of the present invention.

Fig. 6 is a diagram illustrating a path status query command according to an embodiment of the present invention.

Fig. 7 is a schematic diagram of status reporting information in an embodiment of the present invention.

Fig. 8 is a schematic diagram of a path state defined in the NVMeoF protocol in the embodiment of the present invention.

Fig. 9 is a schematic diagram illustrating that a space-dependent state is carried in the state reporting information in the embodiment of the present invention.

FIG. 10 is a diagram illustrating an interval query command according to an embodiment of the present invention.

Fig. 11 is a schematic diagram of reporting information between intervals in the embodiment of the present invention.

Fig. 12 and 13 are schematic diagrams of interval description information of an access interval with a continuous address space and interval description information of an access interval with a discontinuous address space, respectively, carried by interval report information in an embodiment of the present invention.

FIG. 14 is a flow chart of assigning control nodes to execute I/O requests in an embodiment of the present invention.

Fig. 15 is a block diagram of a host provided in the embodiment of the present invention.

Fig. 16 is a block diagram of any control node in the storage system provided in the embodiment of the present invention.

Detailed Description

Fig. 1 is a block diagram of a system to which an embodiment of the present invention is applied. The Host 100 is connected to two switches 200 through two Host ports, for example, Host Bus Adapters (HBAs) 101, respectively. Two switches 200 are respectively connected to the storage system 300, wherein the two switches 200 and the two host ports 101 are arranged to prevent a failure of one of the two switches 200 or one of the two host ports, which may result in a disconnection of a path.

The storage system 300 includes a plurality of control nodes 302, such as Node 1-N. Each control node 302 is connected to two switches (switch)200 via two storage ports 301 (e.g., HBA cards), respectively. Thus, two paths are included between the host and each control node 301, the two paths being redundant paths, and if one of the paths fails, the other path may be used to transmit data between the host 100 and the control node 301. The storage system 300 includes a storage device 304 formed by a plurality of SSDs, and the storage device 304 may be a Redundant Array of Independent Disks (RAID) or a Flash cluster (Just a Bunch of Flash). In some embodiments, the storage device 304 further comprises a virtual SSD, which is mapped to the storage device 304 by other storage devices that communicate with the storage device 304 via nvmeofs (Non-Volatile Memory express over Fabric) protocol. Mapping the remote SSD to the local storage device via the NVMeoF protocol, where the virtual SSD serving as the local storage device is the prior art and is not described herein again. The host 100 and the storage system 300 communicate with each other via NVMeoF protocol.

The logical disk 303 may be constructed by using a local SSD and/or a virtual SSD in the storage device 304, where the logical disk 303 may be a Namespace (Namespace), and the Namespace is an expression manner of the logical disk defined in the NVMeoF protocol. In fig. 1, only one logical disk 303 is shown, but in practical applications, a plurality of logical disks may be constructed in the storage device 304, wherein one logical disk may be allocated to a plurality of hosts for use, or a plurality of logical disks may be allocated to one host for use.

In the embodiment of the present invention, as shown in fig. 1, the host 100 may access the logical disk 303 assigned to the host 100 through the N control nodes 302. For convenience of describing the paths between the host 100 and the logical disk 303, in fig. 2, the physical link between the host 100 and the logical disk in fig. 1 is omitted, and two redundant paths passing through the same control node are merged into one, so that N paths for the host 100 to access the logical disk 303 are obtained.

In the storage system 300, the construction and management of the logical disk 303 may be performed by any control Node 302 designated by a user, and here, it is assumed that the control Node designated by the user is Node 1. After the logical disk 303 is divided, the Node1 assigns a disk Identification (ID) and a disk code to the logical disk 303. The disk Identifier may uniquely identify the logical disk, for example, the disk Identifier includes a Global Unique Identifier (GUID) applied to the NVMeoF protocol, vendor information, and product information. The disk code is used to distinguish different logical disks built in the storage device 304, for example, the disk code of the logical disk 303 may be denoted as Namespace 1.

After the disk identifier and the disk code of the logical disk 303 are allocated, the Node1 allocates a virtual host to the logical disk 303, for example, the virtual host allocated to the logical disk 303 is a virtual host 1, and records the mapping relationship between the logical disk 303 and the virtual host: the virtual host 1: namespace 1.

After the disk id and the disk code of the logical disk 303 are allocated, the Node1 can set a control Node for accessing the logical disk 303. For example, N control nodes are provided for the logical disk 303. To distinguish the control nodes 303, the storage system 300 assigns each control node 303 a unique identifier, i.e., a node identifier. The control node provided for the logical disk 303 may be represented by the following mapping relationship: namespace 1: node1, Node 2, … …, Node N.

In the embodiment of the present invention, the Node1 may further set an access interval accessible to each control Node 302. As shown in fig. 3a, the address space of the logical disk 303 is divided into N consecutive address spaces for Node1, for example, if the size of the storage space of the logical disk 303 is 1G, the partition for Node1 is 0-99M, the partition for Node 2 is 100M-199M, the partition for Node 3 is 200-399M … …, and the partition for Node N is 800-1023M. Each access interval is a continuous address space, and the size of each access interval can be the same or different.

In another embodiment, the access intervals that the Node1 allocates to each control Node are not contiguous. As shown in fig. 3b, the address space of the logical disk 303 is first divided into at least two equal sub-address spaces, and then each sub-address space is divided into N sub-access intervals, where each sub-access interval corresponds to one control node. The division mode of each sub-address space is the same, namely the size of the sub-access interval corresponding to the same control node in each sub-address space and the sequence in the sub-address space are the same. The set of sub-access intervals corresponding to each control node constitutes the access interval of each control node. For example, if the logical disk 303 is divided into 4 sub-address spaces, the access interval of Node1 is: 0 to a0, d0+1 to a1 and d1+1 to a 2; the access interval of Node 2 is: a0+ 1-b 0, a1+ 1-b 1 and a2+ 1-b 2; the access interval of NodeN is: c0+ 1-d 0, c1+ 1-d 1 and c2+ 1-d 2. And the size of each sub-access interval of the access interval corresponding to each Node1 is equal, and the interval is equal. In recording the access section allocated to each control node, a start address of the access section, an address length of sub-access sections constituting the access section, an address interval between each of the sub-access sections, and the number of sub-access sections constituting the access section may be recorded. After the access intervals are allocated to the N control nodes 302, the N control nodes 302 respond to the request of the host, report the access interval corresponding to each control node 302 to the host 100, and the host can issue the I/O request accordingly. A reporting method for the access interval will be described below. In this embodiment of the present invention, if the storage device 304 is further configured with other logical disks besides the logic 303, the other logical disks may or may not divide the access interval according to actual needs.

As shown in fig. 1, when the N control nodes are connected to the host port 101 through the storage port 301 and the switch 200, each control node 302 may obtain an identifier of the host port 101 connected thereto, and then replace a mapping relationship between a virtual host and a disk code of the logical disk 303 with a mapping relationship between the host port identifier and the disk code of the logical disk, where the mapping relationship after replacement may be represented as: HBA1, HBA2 (virtual host 1): namespace 1. In this embodiment of the present invention, for redundancy, the host 100 is connected to the storage system 300 through two host ports 101, where the identifiers of the two host ports are recorded in the mapping relationship, that is: HBA1, HBA 2.

The various mapping relationships generated above, for example, the mapping relationship between the disk code of the logical disk and the node identifiers of the N control nodes 302, the mapping relationship between the host port identifier and the disk code of the logical disk, and the mapping relationship between each control node identifier and the access interval are stored in the storage area accessible by the N control nodes 302.

Fig. 4 is a diagram showing an architecture of the host 100 according to the embodiment of the present invention. In addition to the two host ports 101, the host 100 further includes a processor 102 and a memory 103, and the memory 103 stores a multipath program 104 and an operating system 105. The processor 102 implements management of multiple paths to access the logical disk 303 by the host 100 by executing the operating system 105 and the multipath program 104. The management of multiple paths for accessing the logical disk 303 by the host 100 will be described below by a flowchart shown in fig. 5.

In step S501, when the operating system 105 of the host 100 monitors that the storage system 300 is connected to the host 100, the operating system 105 sends a disk report command to the storage system 300 through multiple paths between the host 100 and the storage system 300 (i.e., multiple paths formed between the storage ports 301 of the N control nodes and the host port 101). When the disk report command passes through the host port 101, the host port identifier is carried in the disk report command.

Step S502, after each control node 302 of the storage system 300 receives the disk report command, each control node 302 acquires a host port identifier in the disk report command, and determines a disk code corresponding to the host port identifier according to the host port identifier and a mapping relationship between the host port identifier and a disk code of a logical disk.

In step S503, each control node 302 of the storage system 300 generates report information for the logical disk corresponding to the disk code, carries the disk code in the report information, and reports the report information to the host 100. Taking Node1 as an example, the reporting process of the report information is described below. The Node1 is connected with two HBA cards of the host through two HBA cards respectively, thus two paths exist between the Node1 and the host, when the Node1 receives two disk reporting commands through the two HBA cards respectively, the Node obtains the identifications of the two HBA cards from the two disk reporting commands respectively, and then obtains the same disk code according to the two HBA card identifications. After the disk codes are obtained, the disk codes can be reported to the host 100 through two paths between the Node1 and the host 100 respectively.

According to the specification of the NVMeoF protocol, in the reporting process of the reporting information, a control Node identifier for reporting the reporting information, storage ports 301 (for example, two HBA ports of a Node 1) through which the reporting information passes, and a host port 101 through which the reporting information passes are added to the reporting information, and thus, a controller identifier, a storage port, a host port, and a disk code in the reporting information are used to represent a path for reporting the disk code.

In step S504, after receiving the report information reported by each control node 302 of the storage system 300, the multi-path program 104 of the host 100 sends a disk identifier query command through a path indicated by the path information in each report information, where the query command includes a disk code in the report information.

Step S505, after receiving the query command, each control node 302 in the storage system 300 obtains a disk identifier corresponding to the disk code according to the disk code in the query command.

Step S506, each control node 302 in the storage system 300 reports the disk identifier corresponding to the disk code through the path for sending the query command. Since the storage system 300 allocates the disk identifier and the disk code to each logical disk 303 when creating the logical disk 303, the storage system 300 can obtain the disk identifier according to the disk code after receiving the disk code.

In step 507, the multipath program 104 of the host 100 determines a path for accessing the logical disk corresponding to the disk identifier according to the reported disk identifier, and manages the determined path.

Specifically, when receiving a disk identifier for the first time, the multipath program 104 creates a disk object according to the disk identifier, including assigning a disk object name to the disk object, establishing a mapping relationship between the disk object name and the disk identifier, and recording a path that reports the disk identifier as a path of the disk object. Further, other related information of the disk, such as address space, capacity size, etc., may also be recorded in the disk object. And if the same disk identifier is received subsequently, recording a path for reporting the disk identifier subsequently as another path of the disk object.

Since, in the embodiment of the present invention, a redundant path is provided between each control node 302 and the host 100 for reliability, when the multi-path program manages a path for accessing each logical disk 303, the redundant paths between each control node 302 and the host 100 are merged into one. When merging, the paths of the same control node can be recorded together according to the control node identification. Thus, when the host issues the I/O request, one of the redundant paths is selected to issue the I/O, and when one of the paths fails, the other path is used for replacing the failed path to issue the I/O. For convenience of description, the merged path, that is, the case where only one path exists between each control node 302 and the host 100, is described below.

In this way, the multipath software 104 may manage multiple paths for accessing the disk through the disk object, for example, find a new path, delete a disconnected path, select a path for issuing an IO request according to a preset policy, and the like.

For example, if the multipath software 104 first receives the disk identifier of the logical disk encoded as Namespace1 on path 1, it establishes a disk object for Namespace1, assigns a disk object name sda to the disk object, establishes a mapping relationship between sda and the disk identifier, and records path 1 as the first path of the disk object. If the disk ID of Namespace1 is received on path 2 and path 3, then path 2 and path 3 are recorded as the path of the disk object. For example, in this embodiment, after the multipath software receives 2N disk identifiers Namespace1 on 2N paths and merges paths of the same control node, the multipath software manages N paths between the host and the storage system.

In step S508, after the access path is established for the logical disk, the multi-path software 104 sends a path state query command to the storage system 300, where the state query command is used to instruct each control node to report the path state of the path where the control node is located.

In the embodiment of the present invention, the host 100 and the storage system 300 communicate with each other through the NVMeoF protocol, the host first generates a Non-Volatile Memory standard (NVMe) path status query command, then encapsulates the NVMe interval query command to the external network protocol to obtain an external network protocol interval query command, and the path status query command sent in this step is the external network protocol interval query command.

The NVMe path status query command is a command defined in the existing NVMe protocol. The NVMe path query Command is a Command defined by a Get Log Page-Command Dword 10 in the NVMe protocol. The NVMe path state query command is shown in fig. 6, where a command identifier OCh of the path state query command is carried in a log pageidentifier (LID) field, and the LID field is located at a 07:00 byte position of the path state query command. Other fields in the command are not relevant to the embodiments of the present invention and are not specifically described herein, and only the fields relevant to the embodiments of the present invention are described in the following description.

In the embodiment of the present invention, the external network protocol may be a Fibre Channel (FC), an InfiniBand (InfiniBand), a roce (rdma over converted ethernet), an iwarp (rdma over tcp), a high-speed serial computer extended bus standard (PCIe), and other network protocols.

In step S509, the control node that receives the path state query command obtains the path state, and reports the path state to the multipath software 104 of the host computer in the state report information.

Because the path state query command is an external network protocol interval query command, after receiving the path state query command, the control node receiving the path state query command first analyzes the path state query command to obtain the NVMe path state query command, and then queries the path state of the path where the control node receiving the path state query command is located according to the indication of the NVMe path state query command.

In the NVMeoF protocol, several path states are predefined, as shown in fig. 8, where the path states indicated by State codes 01h to 04h are path states defined in the existing protocol, where the State code 01h represents a preferred State (i.e., an optimized State), and after receiving State report information including the State code 01h, the host 100 sets a path reporting the State report information as a preferred path; the State code 02h indicates a Non-Optimized State (i.e., an ANA Non-Optimized State), and after receiving the State report information including the code 02h, the host 100 sets a path for reporting the State report information as a Non-Optimized path; the State code 03h indicates an inaccessible State (i.e., an inaccessible State), and after receiving the State reporting information including the State code 03h, the host 100 sets a path for reporting the State reporting information as an inaccessible path; the status code 04h indicates a failure status (i.e., an ANAPersistent Loss State), and after receiving the status report information including the status code 03h, the host 100 sets a path for reporting the status report information as a failure path.

The spatial dependency status (ANA LBA-dependent State) indicated by the status code 05h is a new status added in the embodiment of the present invention. The spatially dependent state indicates that the logical disk 303 is divided into a plurality of access intervals. After receiving the state reporting information including the state code 05h, the host 100 acquires an access interval corresponding to a control node reporting the state, and a specific acquisition manner will be described in detail below.

The way for the control node receiving the path state query command to acquire the path state is as follows: determining whether the logical disk 303 is divided into a plurality of access intervals, and when the logical disk 303 is not divided into a plurality of access intervals, acquiring the path state of the path where the control node is located. In the storage system, the state of each path is identified in advance. Such as a preferred path and a non-preferred path that are preset to access the logical disk 303, a previously identified inaccessible path and failed path are also identified. When the logical disk 303 is divided into a plurality of access intervals, then the path state may be determined to be the spatially dependent state. After acquiring the path states, the control node that receives the path state query command reports the state codes of the respective path states to the host 100 through the state reporting information shown in fig. 7.

And generating response information of the NVMe path inquiry command after the path state is determined. Fig. 7 shows a format of response information of an NVMe path query command in the existing NVMeoF protocol. The response information includes a number-of-disks field, which is used to carry the number of logical disks that the host 100 can access, and is carried at bytes 07:04 of the reported information. In the embodiment of the present invention, only one logical disk is taken as an example for description, and 1 is filled here. The state reporting information further includes a path state field, which is used for carrying the path state and is carried at 16 bytes of the reporting information. When the logical disk 303 is not divided into a plurality of access sections, 16 bytes are filled with one of codes 01h to 04h indicating the path state, for example, if the path state of the path where the control node is located is the preferred state, the control node can carry the state code 01h corresponding to the preferred state in the byte 16. When the logical disk 303 is divided into a plurality of access intervals, as shown in fig. 9, the code 05h indicating the space-dependent state is carried in 16 bytes of the state report information.

And the generated response information of the NVMe path inquiry command is encapsulated into the external network protocol to obtain an external network protocol state response message, and the external network protocol state response message is reported to the host as report information.

In step S510, after receiving the report information, the multi-path software 104 of the host determines whether the path state in the report information indicates that the logical disk 303 includes multiple access intervals.

After receiving the report information, the multi-path software 104 of the host analyzes the report information to obtain NVMe state query command response information encapsulated in the report information, and then obtains the state code indicating the path state from the response information. If the status code is 05h, the multipath software 104 determines that the logic 303 includes multiple access intervals, and if the status code is one of 01h to 04h, the multipath software 104 determines that the logical disk is not divided into multiple access intervals.

Step S511, when the multi-path software determines that the logical disk is not divided into multiple access intervals, recording a path code as a path state of a path for reporting the state reporting information, so as to be referred to when the multi-path software 104 selects a path for I/O.

Step S512, when the multi-path software 104 determines that the logical disk is not divided into multiple access intervals, sending an interval query command to the storage system 300, where the interval query command is used to instruct the control nodes in the storage system to report the access intervals allocated to the control nodes.

Before sending the interval query Command, the multi-path software firstly generates an NVMe interval query Command, and the NVMe interval query Command is also a Command defined by Get Log Page-Command Dword 10 in the NVMe protocol. However, the NVMe interval query command does not exist in the existing NVMe protocol. The NVMe interval query Command is a Command newly defined based on the Get Log Page-Command Dword 10. The interval query command is a sub command added in the NVMeoF protocol, as shown in fig. 10, a command format of the NVMe interval query command is the same as that of the NVMe path status query command, and the difference is that the interval query command carries a command identifier of the NVMe interval query command in an LID field, for example, 0 Dh. After the NVMe interval query command is generated, the NVMe interval query command can be encapsulated in the external network protocol to obtain an external network protocol interval query command, and the external network protocol interval query command is the interval query command.

In the storage system 300, response information of the NVMe interval query command is defined for the NVMe interval query command, as shown in fig. 11, a Namespace identification field of the response information of the NVMe interval query command is used to carry a disk code of a logical disk, for example, Namespace1, at 03:00 bytes of the response information, an access interval number field of the response information of the NVMe interval query command is used to carry the number of access intervals carried in the interval report information, and at 11:08 bytes of the response information, in the embodiment of the present invention, each control node only accesses one logical disk, so the number of the access intervals is 1. In other embodiments, the host 100 may access other logical disks than the logical disk 303 through the control node, and the number of access intervals filled in here may not be 1. The response information of the NVMe interval query command further includes an access interval description field, where the access interval description field is used to carry access interval information of an access interval, and the access interval description field is 12:27 bytes in the response information, for example, user data segment descriptor 1. When the interval report information carries multiple access intervals, multiple access interval representation fields may be added to the response information, for example, each access interval information is sequentially written in bytes after 27 bytes. The description of each access interval is shown in fig. 12 and 13.

Referring to the descriptions of fig. 3a and fig. 3b, there are two ways of dividing the access interval, one is the way of dividing the continuous address space shown in fig. 3a, and the other is the way of dividing the non-continuous address space shown in fig. 3 b.

If the access interval is a division manner of a continuous address space, as shown in fig. 13, the access interval description information includes an access interval first address field for carrying a first address of the access interval, for example, at a position of 07:00 bytes of the access interval description information, and an interval length field for carrying a length of the access interval, for example, at a position of 07:00 bytes of the interval description information, since the address space is continuous, there is no sub-access interval, 15:12 bytes carrying the sub-access interval are 0. If the address of each access interval is discontinuous in the partition mode that the access intervals are discontinuous address spaces, bytes 07:00 indicate the first address of the first sub-access interval, 11:08 indicates the length of each sub-access interval, and 15:12 indicates the interval between two adjacent sub-access intervals.

In step S513, each control node of the storage system receives the interval query command, generates interval report information, and reports the interval report information to the host 100.

When the interval query command is received, the control node that receives the query command analyzes the interval query command to obtain the NVMe interval query command, then carries the access interval of the control node that receives the query command in the response information of the NVMe interval query command according to the format of the response information of the predefined NVMe interval query command, encapsulates the NVMe interval query command response information into the external network protocol to form the external network protocol interval response message, and reports the external network protocol interval response message as interval report information to the host 100.

Step S514, after receiving the interval report information, the multi-path software 104 of the host acquires the access interval information from the interval report information, and then records the access interval information corresponding to the control node reporting the interval report information, so that when receiving the I/O request, the multi-path software can allocate the I/O request according to the information.

When the multi-path software 104 of the host acquires the access interval information, the interval report information is firstly analyzed to obtain the NVMe interval query command response information, and then the access interval information is acquired from the NVMe interval query command response information.

As shown in FIG. 14, a flow chart for allocating a control node executing an I/O request when the I/O request is received by the host 100 is shown.

Step 1301, receiving an I/O request by a host 100, wherein the I/O request carries a logical address of data to be accessed;

step 1302, the host 100 determines an access interval where the logical address is located;

step 1303, the host 100 determines a control node for processing the I/O request according to the determined access interval;

in step 1304, the host 100 issues the I/O request to the control node, and the control node processes the I/O request.

Therefore, the I/O request issued by the host can be processed by the control node corresponding to the access interval where the logic address of the data to be accessed is located, so that the problem that the I/O request with the same logic address is issued to different control nodes for processing to cause the forwarding of the I/O request is avoided, and the time delay of the I/O request is reduced.

Fig. 15 and fig. 16 are block diagrams of any control node in the host and the storage system provided in the embodiment of the present invention, respectively. The host includes a path management module 1501, a path state query module, an interval query module 1503, an interval recording module 1504, an interval determination module 1505, and an I/O issuing module 1506. The control node includes a disk code reporting module 1601, a disk identifier reporting module 1602, a path state reporting module, and an interval reporting module 1604.

The path management module 1501 of the host is configured to send a disk report command to the storage system 300 through a plurality of paths between the host 100 and the storage system 300 respectively when the operating system 105 of the host 100 monitors that the storage system 300 is connected to the host 100.

After receiving the disk report command, the disk code reporting module 1601 of the control node of the storage system 300 obtains a host port identifier in the disk report command, determines a disk code corresponding to the host port identifier according to the host port identifier and a mapping relationship between the host port identifier and a disk code of a logical disk, generates report information for the logical disk corresponding to the disk code, carries the disk code in the report information, and reports the report information to the host 100.

After receiving the report information reported by each control node 302 of the storage system 300, the path management module 1501 of the host sends a disk identifier query command through a path indicated by the path information in each report information, where the query command includes a disk code in the report information.

After receiving the query command, the disk identifier reporting module 1602 of the control node of the storage system 300 obtains the disk identifier corresponding to the disk code according to the disk code in the query command, and reports the disk identifier corresponding to the disk code through the path through which the query command is sent.

The path management module 1501 of the host determines a path for accessing the logical disk corresponding to the disk identifier according to the reported disk identifier, and manages the determined path.

The path management module 1501 of the host performs the same functions as those performed in steps S501, S504, and S507 of fig. 5. The functions executed by the disk code reporting module 1601 of the control node are the same as steps S502 and S503 in fig. 5, the functions executed by the disk identifier reporting module 1602 of the control node are the same as steps S505 and S507 in fig. 5, and please refer to the description of the response step in fig. 5 for details.

After the access path is established for the logical disk, the path state query module 1502 of the host sends a path state query command to the storage system 300, where the path state query command is used to instruct each control node to report the path state of the path where the control node is located. The function performed by the path status query module 1502 is the same as the function performed in step S508, so the details of the implementation of the path status query module 1502 can refer to the related description of step S508.

When receiving the path status query command, the path status reporting module 1603 of the storage node carries the path status in status reporting information and reports the status to the host. The specific implementation details of the path status reporting module 1603 may refer to the related description of step S509.

After receiving the report information, the interval query module 1503 of the host determines whether a path state in the report information indicates that the logical disk 303 includes multiple access intervals, records a path code as a path state of a path that reports the state report information when determining that the logical disk is not divided into multiple access intervals, so as to be referred to when the multi-path software 104 selects a path for I/O, and sends an interval query command to the storage system 300 when determining that the logical disk is not divided into multiple access intervals, where the interval query command is used to indicate a control node in the storage system to report an access interval allocated to each control node. The specific implementation details of the interval query module 1503 can refer to the relevant descriptions of steps S510 to S512.

The control node interval report module 1604 receives the interval query command, generates interval report information, and reports the interval report information to the host 100. The specific implementation details of the interval reporting module 1504 may refer to the related description in step S513.

The interval recording module 1504 of the host acquires access interval information from the interval report information after receiving the interval report information, and then records the access interval information corresponding to the control node reporting the interval report information, so that when an I/O request is received, the I/O request can be allocated according to the information.

The interval determining module 1505 of the host is configured to receive an I/O request, where the I/O request carries a logical address of data to be accessed, and determine an access interval where the logical address is located.

The I/O issuing module 1506 is configured to determine a control node that processes the I/O request according to the determined access interval, and then issue the I/O request to the control node, where the control node processes the I/O request.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:磁盘清理方法、装置及设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类