Method for controlling data to be read out to host and controller

文档序号：115239 发布日期：2021-10-19 浏览：45次中文

阅读说明：本技术 一种用于控制数据读出到主机的方法及控制器 (Method for controlling data to be read out to host and controller ) 是由刘传杰张泽于 2021-07-01 设计创作，主要内容包括：本申请涉及一种用于控制数据读出到主机的方法及控制器,控制器包括：SGL电路和/或PRP电路、读取控制电路和共享存储器,SGL电路响应于接收的读命令,获取与读命令对应的SGL；根据SGL来生成至少一个DMA命令组,每个DMA命令组包括至少一个DMA命令；将DMA命令组存储在共享存储器中；PRP电路响应于接收的读命令,获取与读命令对应的PRP；根据PRP来生成至少一个DMA命令组,每个DMA命令组包括至少一个DMA命令；将DMA命令组存储在共享存储器中；读取控制电路从共享存储器中获取存储的DMA命令组；以及响应于接收到某个DMA命令组对应的数据,根据DMA命令组确定对应的主机内存地址,向主机搬移数据；共享存储器存储DMA命令组。本申请的技术方案能够提高读命令的处理效率。(The present application relates to a method and a controller for controlling data read-out to a host, the controller comprising: the read control circuit comprises an SGL circuit and/or a PRP circuit, a read control circuit and a shared memory, wherein the SGL circuit responds to a received read command and acquires an SGL corresponding to the read command; generating at least one DMA command group according to the SGL, each DMA command group including at least one DMA command; storing the set of DMA commands in a shared memory; the PRP circuit responds to the received read command and acquires a PRP corresponding to the read command; generating at least one DMA command group according to the PRP, each DMA command group including at least one DMA command; storing the set of DMA commands in a shared memory; the reading control circuit acquires a stored DMA command group from the shared memory; responding to the received data corresponding to a certain DMA command group, determining a corresponding host memory address according to the DMA command group, and moving the data to the host; the shared memory stores a set of DMA commands. The technical scheme of the application can improve the processing efficiency of the read command.)

1. A method for controlling data read out to a host, comprising:

responding to the received read command, and acquiring an SGL or PRP corresponding to the read command;

generating at least one DMA command group from the SGL or PRP, each DMA command group including at least one DMA command;

storing the set of DMA commands in a shared memory; and

obtaining a stored set of DMA commands from the shared memory;

and after responding to the received data corresponding to the indication of a certain DMA command group, moving the data to the host according to the DMA command group.

2. The method of claim 1,

processing a plurality of read commands in parallel through one or a plurality of SGL processing units or processing a plurality of read commands in parallel through one or a plurality of PRP processing units to obtain SGL or PRP corresponding to each read command; storing one or more DMA command groups corresponding to each read command into a shared memory;

processing each read command, and moving the data to be accessed by each read command from the NVM to the memory of the storage device; and responding to that data corresponding to the first DMA command group has been moved to a memory of the storage device;

and acquiring one or more DMA commands corresponding to the first DMA command group from a shared memory, and carrying out data transfer according to the one or more DMA commands of the first DMA command group.

3. A controller for controlling data read to a host, comprising: SGL circuitry and/or PRP circuitry, read control circuitry and shared memory,

the SGL circuit is configured to: responding to the received read command, and acquiring an SGL corresponding to the read command; generating at least one DMA command group from the SGL, each DMA command group including at least one DMA command; storing the set of DMA commands in a shared memory;

the PRP circuit is to: responding to the received read command, and acquiring a PRP corresponding to the read command; generating at least one DMA command group according to the PRP, each DMA command group including at least one DMA command; storing the set of DMA commands in a shared memory;

the read control circuit is configured to: obtaining a stored set of DMA commands from the shared memory; responding to the received data corresponding to a certain DMA command group, determining a corresponding host memory address according to the DMA command group, and moving the data to the host;

the shared memory is to: the DMA command set is stored.

4. The controller of claim 3,

the SGL circuit comprises a plurality of parallel SGL branches, the PRP circuit comprises a plurality of parallel PRP branches, each SGL or PRP branch independently processes a respective read command and generates an SGL or PRP corresponding to the respective read command; and storing one or more DMA command groups corresponding to each read command into the shared memory.

5. The controller of claim 4,

the reading control circuit comprises a reading initiating circuit and a DMA transmission circuit;

the reading initiating circuit requests the back-end module to process the reading command, so that the back-end module moves the data from the NVM to the memory of the storage device; responding to the data corresponding to the first DMA command group being moved to the memory of the storage device, and providing the index of the first DMA command group to the DMA transmission circuit;

the DMA transmission circuit is connected with the reading initiating circuit; and acquiring one or more DMA commands corresponding to the DMA command group from the shared memory according to the DMA command group index received from the read initiating circuit, and carrying out data transfer according to the acquired one or more DMA commands.

6. The controller according to claim 4 or 5,

the SGL branch circuit generates a first DMA command group according to the SGL of the first read command to read the shared memory, and responds to the received second read command to acquire the SGL corresponding to the second read command no matter whether the first DMA command group is processed or not; generating at least one second DMA command group according to the SGL of the second read command; and/or

After the PRP branch generates a third DMA command group according to the PRP of the third read command to read the shared memory, responding to the received fourth read command no matter whether the third DMA command group is processed or not, and acquiring the PRP corresponding to the fourth read command; at least one fourth DMA command group is generated based on the PRP of the fourth read command.

7. The controller according to any one of claims 4 to 6,

while the SGL branch generates at least one second DMA command group according to the SGL of a second read command, a read initiation circuit processes the first read command; and/or

The read initiate circuit processes the third read command while the PRP branch generates at least one fourth DMA command group according to the PRP of the fourth read command.

8. The controller of claim 7,

the reading initiating circuit requests the back-end module to process a first reading command; requesting the back-end module to process the second read command regardless of whether the first read command is processed by the back-end module;

the reading initiating circuit requests the back-end module to process a third reading command; the back-end module is requested to process the fourth read command regardless of whether the third read command is processed by the back-end module.

9. The controller according to any one of claims 4 to 8,

selecting a read command for the SGL branch and generating a DMA command group by the SGL branch according to the SGL of the selected read command;

a read command is selected for the PRP branch and a DMA command group is generated by the PRP branch based on the PRP of the selected read command.

10. The controller according to any of claims 4-9, wherein the SGL branch comprises an SGL fetch sub-circuit and an SGL parse sub-circuit,

the SGL fetch subcircuit is configured to: obtaining an SGL from a read command, or obtaining the SGL from a host according to an SGL pointer of the read command;

the SGL parsing subcircuit is configured to: generating at least one DMA command group from the SGL; storing the set of DMA commands in a shared memory;

the PRP branch comprises a PRP acquisition sub-circuit and a PRP resolution sub-circuit, wherein,

the PRP acquisition sub-circuit is to: acquiring a PRP from a read command, or acquiring the PRP from a host according to a PRP pointer of the read command;

the PRP resolution sub-circuit is to: at least one DMA command group is generated in accordance with the PRP, the DMA command group being stored in a shared memory.

Technical Field

The present application relates generally to the field of data processing technology. More particularly, the present application relates to a method and controller for controlling the readout of data to a host.

Background

FIG. 1A illustrates a block diagram of a solid-state storage device. The solid-state storage device 102 is coupled to a host for providing storage capabilities to the host. The host and the solid-state storage device 102 may be coupled by various methods, including but not limited to, connecting the host and the solid-state storage device 102 by, for example, SATA (Serial Advanced Technology Attachment), SCSI (Small Computer System Interface), SAS (Serial Attached SCSI), IDE (Integrated Drive Electronics), USB (Universal Serial Bus), PCIE (Peripheral Component Interconnect Express, PCIE, high-speed Peripheral Component Interconnect), NVMe (NVM Express, high-speed nonvolatile storage), ethernet, fiber channel, wireless communication network, etc. The host may be an information processing device, such as a personal computer, tablet, server, portable computer, network switch, router, cellular telephone, personal digital assistant, etc., capable of communicating with the storage device in the manner described above. The storage device 102 (hereinafter, a solid-state storage device is simply referred to as a storage device) includes an interface 103, a control section 104, one or more NVM chips 105, and a DRAM (Dynamic Random Access Memory) 110.

The NVM chip 105 includes a NAND flash Memory, a phase change Memory, a FeRAM (Ferroelectric RAM), a MRAM (magnetoresistive Memory), a RRAM (Resistive Random Access Memory), and the like, which are common storage media.

The interface 103 may be adapted to exchange data with a host by means of, for example, SATA, IDE, USB, PCIE, NVMe, SAS, ethernet, fibre channel, etc.

The control unit 104 is used for controlling data transmission among the interface 103, the NVM chip 105, and the DRAM110, and also for memory management, host logical address to flash physical address mapping, erase balancing, bad block management, and the like. The control component 104 can be implemented in various manners of software, hardware, firmware, or a combination thereof, for example, the control component 104 can be in the form of an FPGA (Field-programmable gate array), an ASIC (Application-Specific Integrated Circuit), or a combination thereof. The control component 104 may also include a processor or controller in which software is executed to manipulate the hardware of the control component 104 to process IO (Input/Output) commands. The control component 104 may also be coupled to the DRAM110 and may access data of the DRAM 110. FTL tables and/or cached IO command data may be stored in the DRAM.

The control section 104 issues a command to the NVM chip 105 in a manner conforming to the interface protocol of the NVM chip 105 to operate the NVM chip 105, and receives a command execution result output from the NVM chip 105. Known NVM chip interface protocols include "Toggle", "ONFI", etc.

The memory Target (Target) is one or more Logic Units (LUNs) sharing a CE (Chip Enable) signal within the NAND flash package. One or more dies (Die) may be included within the NAND flash memory package. Typically, a logic cell corresponds to a single die. The logical unit may include a plurality of planes (planes). Multiple planes within a logical unit may be accessed in parallel, while multiple logical units within a NAND flash memory chip may execute commands and report status independently of each other.

Data is typically stored and read on a storage medium on a page-by-page basis. And data is erased in blocks. A block (also referred to as a physical block) contains a plurality of pages. A block contains a plurality of pages. Pages on the storage medium (referred to as physical pages) have a fixed size, e.g., 17664 bytes. Physical pages may also have other sizes.

In the storage device 102, mapping information from a logical address (LBA) to a physical address is maintained by an FTL (Flash Translation Layer). The logical addresses constitute the storage space of the solid-state storage device as perceived by upper-level software, such as an operating system. The physical address is an address for accessing a physical memory location of the solid-state memory device. Address mapping may also be implemented using an intermediate address modality in the related art. E.g. mapping the logical address to an intermediate address, which in turn is further mapped to a physical address. A table structure storing mapping information from logical addresses to physical addresses is called an FTL table. FTL tables are important metadata in storage devices. The data entry of the FTL table records the address mapping relationship in units of data units in the storage device.

Hosts access storage devices with IO commands that follow a storage protocol. The control component generates one or more media interface commands according to the IO commands from the host and provides the media interface commands to the media interface controller. The media interface controller generates storage media access commands (e.g., program commands, read commands, erase commands) that conform to the interface protocol of the NVM chip in accordance with the media interface commands. The control component also tracks that all media interface commands generated from one IO command are executed and indicates the processing result of the IO command to the host.

Referring to fig. 1B, the control means includes a host interface 1041, a host command processing unit 1042, a storage command processing unit 1043, a media interface controller 1044, and a storage media management unit 1045. The host interface 1041 acquires IO commands provided by the host. The host command processing unit 1042 generates a storage command according to the IO command and provides the storage command to the storage command processing unit 1043. The store command may access the same size of memory space, e.g., 4 KB. A data unit recorded in the NVM chip corresponding to data accessed by one storage command is referred to as a data frame. A physical page records one or more frames of data. For example, a physical page is 17664 bytes in size, and a data frame is 4KB in size, then one physical page can store 4 data frames.

The storage media management unit 1045 maintains a logical to physical address translation for each storage command. For example, the storage medium management unit 1045 includes an FTL table (FTL will be explained below). For a read command, the storage media management unit 1045 outputs a physical address corresponding to a logical address (LBA) accessed by the storage command. For a write command, the storage media management unit 1045 allocates an available physical address to it, and records a mapping relationship between a logical address (LBA) accessed by it and the allocated physical address. The storage medium management unit 1045 also maintains functions required to manage the NVM chips such as garbage collection, wear leveling, etc.

The storage command processing unit 1043 operates the media interface controller 1044 to issue a storage media access command to the NVM chip 105 according to the physical address provided by the storage media management unit 1045.

For clarity, commands sent by the host to the storage device 102 are referred to as IO commands, commands sent by the host command processing unit 1042 to the storage command processing unit 1043 are referred to as storage commands, commands sent by the storage command processing unit 1043 to the media interface controller 1044 are referred to as media interface commands, and commands sent by the media interface controller 1044 to the NVM chip 105 are referred to as storage media access commands. The storage medium access commands follow the interface protocol of the NVM chip.

In the NVMe protocol, after receiving a write command, the solid-state storage device 102 obtains data from the memory of the host through the host interface 1041, and then writes the data into the flash memory. For a read command, the solid state storage device 102 moves data to the host memory through the host interface 1041 after the data is read from the flash memory.

Data transferred between a host and a storage device is described in two ways: one is PRP (Physical Region Page) and the other is SGL (Scatter/Gather List). A PRP is a number of linked PRP entries, each of which is a 64-bit memory physical address that describes a physical Page (Page) space. The SGL is a linked list and consists of one or more SGL sections, and each SGL section consists of one or more SGL descriptors; each SGL descriptor describes the address and the length of the data cache, namely each SGL descriptor corresponds to a host memory address space; each SGL descriptor has a fixed size (e.g., 16 bytes).

Whether PRP or SGL, essentially describes one or more address spaces in host memory, where these address spaces are located arbitrarily in host memory. The host carries PRP or SGL related information in NVMe commands, telling the storage device where the data source is in the host memory, or where the data read from the flash memory should be put in the host memory.

In the prior art, when the host command processing unit 1042 processes an IO command, it needs to obtain a corresponding SGL or PRP from a host according to the IO command, and analyze the SGL or PRP to determine a corresponding host memory address. Fig. 1C shows a basic structure of a host command processing unit 1042 of the prior art. As shown in fig. 1C, the host command processing unit 1042 mainly includes a shared memory, a DMA module, and a sub-CPU system. The sub-CPU system comprises a plurality of CPUs, and the CPUs are used for running programs to process SGLs or PRPs and configuring DMA modules. The DMA module is used for processing the DMA command and implementing data transmission between the host and the storage device. A shared memory (share memory) is used to store data, NVMe commands, and the like.

Taking SGL as an example for illustration, as shown in fig. 2, one SGL includes three SGL segments, and the first SGL segment includes an SGL descriptor: SGL descriptor 0-1; the second SGL segment includes three SGL descriptors: SGL descriptor 1-1, SGL descriptor 1-2 and SGL descriptor 1-3; the third SGL segment includes two SGL descriptors: SGL descriptor 2-1 and SGL descriptor 2-2.

SGL descriptor 0-1 describes a 3KB data space in host memory, namely memory block A; similarly, SGL descriptor 1-1 describes 2KB memory block B, SGL descriptor 1-2 describes 2KB memory block C, SGL descriptor 1-3 describes 1KB memory block D, SGL descriptor 2-1 describes 4KB memory block E, and SGL descriptor 2-2 describes 1KB memory block F. An SGL linked list describes a total of 13KB of data space.

Disclosure of Invention

After the storage device receives an IO command, for example, a read command from a host, a host command processing unit obtains an SGL or a PRP according to the IO command, for example, the host command processing unit obtains the SGL or the PRP from the host according to an SGL pointer or a PRP pointer carried by the read command, and places the SGL or the PRP in a shared memory, then a CPU in the host command processing unit needs to analyze the SGL or the PRP to generate one or more DMA commands, and the CPU analyzes the SGL or the PRP to generate the DMA commands, which may occupy CPU resources and increase CPU load. In addition, whether the read command received by the storage device from the host is related to the SGL or the PRP is uncertain, it is necessary to identify whether the read command is related to the SGL or the PRP, and then corresponding processing is performed according to the identification result, but different processing modes are used for the SGL or the PRP, which increases the processing complexity; moreover, the data size indicated by the read command sent by the host is not of a fixed length, so that the processing convenience and the regularity are poor; moreover, the storage device can receive a plurality of read commands sent by the host, and due to limited resources, the resources can be preempted among a plurality of read command processes, so that a conflict is caused. In addition, in the process of processing the read command, it is uncertain which DMA command corresponds to the data read first, for example, if the logical address indicated by the read command is LBA0 to LBA n, and the logical address indicated by the currently processed DMA command is LBAm to LBAp, and 0< m < n, m < p < n, that is, the data corresponding to the logical address in the middle range of the logical address indicated by the read command is read first, in order to move the data indicated by the current DMA command from the memory to the host, the host memory address corresponding to the data needs to be calculated according to the SGL, and the host memory address needs to be calculated by the SGL, which is complicated.

According to the method and the device, the hardware device independent of the CPU is arranged in the host command processing unit, the SGL or PRP related to the read command is analyzed through the hardware device to generate one or more DMA commands, and the one or more DMA commands are stored in the shared memory, so that the CPU is released from the work of analyzing the SGL or PRP, and further the load of the CPU is reduced.

Further, in order to reduce the complexity of resolving SGLs or PRPs, one or more parallel SGLs and/or PRP branches are provided in the hardware device, the SGLs are resolved through the SGL branches, the PRPs are resolved through the PRP branches, each SGL branch or PRP branch can process one read command, and one or more parallel SGLs and/or PRP branches can simultaneously process multiple read commands in parallel. On one hand, the hardware equipment can analyze not only the SGL but also the PRP, and provides a uniform way to process the SGL or the PRP, thereby reducing the processing complexity; on the other hand, the hardware equipment can process a plurality of read commands simultaneously and in parallel, and the possibility that the resource is preempted among a plurality of read command processes to cause conflict is reduced.

Further, in the solution provided in the embodiment of the present application, the hardware device splits a read command with an indefinite length into a plurality of storage commands (e.g., DMA command groups) with a fixed length, for example, each storage command indicates 4KB of data, and then generates one or more DMA commands according to each storage command, thereby improving the convenience and the regularity of processing. In addition, according to the scheme of the application, the position of the logic address indicated by each storage command in the logic address indicated by the read command can be directly corresponding to the DMA command through the storage command, so that the complex calculation when the data responding to the read command is read out to the memory of the storage device can be avoided, and the overall processing efficiency of the read command is improved.

According to a first aspect of the present application, there is provided a first method for controlling data readout to a host according to the first aspect of the present application, comprising: responding to the received read command, and acquiring an SGL or PRP corresponding to the read command; generating at least one DMA command group from the SGL or PRP, each DMA command group including at least one DMA command; storing the set of DMA commands in a shared memory; and retrieving a stored set of DMA commands from the shared memory; and after responding to the received data corresponding to the indication of a certain DMA command group, moving the data to the host according to the DMA command group.

According to a first method for controlling data readout to a host according to a first aspect of the present application, there is provided a second method for controlling data readout to a host according to the first aspect of the present application, the obtaining an SGL or a PRP corresponding to a read command includes: obtaining an SGL pointer or a PRP pointer according to the read command; obtaining the SGL or the PRP from a host according to the SGL pointer or the PRP pointer, and storing the SGL or the PRP; or acquiring the SGL or the PRP according to the read command and storing the SGL or the PRP.

According to a first method for controlling data readout to a host according to the first aspect of the present application, there is provided a third method for controlling data readout to a host according to the first aspect of the present application, the data size indicated by the read command being non-fixed; the data size indicated by the DMA command group is a fixed value, or the data size indicated by the IO command is modulo the fixed value, and the sum of the data sizes indicated by the DMA command group is equal to the data size indicated by the read command; the sum of the data sizes indicated by the DMA commands is equal to the data size indicated by the belonging DMA command group.

The third method for controlling data readout to a host according to the first aspect of the present application provides the fourth method for controlling data readout to a host according to the first aspect of the present application, the fixed value being 4 KB.

According to a fourth method for controlling data read-out to a host according to the first aspect of the present application, there is provided a fifth method for controlling data read-out to a host according to the first aspect of the present application, each DMA command comprising a host memory address and a storage device memory address.

According to a fifth method for controlling data readout to a host according to the first aspect of the present application, there is provided a sixth method for controlling data readout to a host according to the first aspect of the present application, wherein the one or more SGL processing units process a plurality of read commands in parallel, or the one or more PRP processing units process a plurality of read commands in parallel, so as to obtain an SGL or a PRP corresponding to each read command; storing one or more DMA command groups corresponding to each read command into a shared memory; processing each read command, and moving the data to be accessed by each read command from the NVM to the memory of the storage device; and responding to that data corresponding to the first DMA command group has been moved to a memory of the storage device; and acquiring one or more DMA commands corresponding to the first DMA command group from a shared memory, and carrying out data transfer according to the one or more DMA commands of the first DMA command group.

According to a sixth method for controlling data readout to a host according to the first aspect of the present application, there is provided the seventh method for controlling data readout to a host according to the first aspect of the present application, wherein after the SGL unit generates the first DMA command group according to the SGL of the first read command to read the shared memory, the SGL unit acquires the SGL corresponding to the second read command in response to the received second read command, regardless of whether the first DMA command group is processed; generating at least one second DMA command group according to the SGL of the second read command; and/or after the PRP unit generates a third DMA command group according to the PRP of the third read command to read the shared memory, responding to the received fourth read command no matter whether the third DMA command group is processed, and acquiring the PRP corresponding to the fourth read command; at least one fourth DMA command group is generated based on the PRP of the fourth read command.

According to a seventh method for controlling data readout to a host according to the first aspect of the present application, there is provided the eighth method for controlling data readout to a host according to the first aspect of the present application, processing the first read command while the SGL unit generates at least one second DMA command group according to the SGL of the second read command; processing a third read command while the PRP unit generates at least one fourth DMA command group according to the PRP of a fourth read command.

According to a sixth method for controlling data readout to a host according to the first aspect of the present application, there is provided a ninth method for controlling data readout to a host according to the first aspect of the present application, the DMA commands of the plurality of DMA command groups stored in the shared memory are processed in parallel.

According to a sixth method for controlling data readout to a host according to the first aspect of the present application, there is provided a tenth method for controlling data readout to a host according to the first aspect of the present application, selecting a plurality of read commands for processing according to characteristics of the read commands.

According to a method for controlling data readout to a host according to any one of the sixth to tenth aspects of the first aspect of the present application, there is provided the eleventh method for controlling data readout to a host according to the first aspect of the present application, selecting a read command for an SGL unit and generating a DMA command group by the SGL unit according to an SGL of the selected read command; and/or selecting a read command for a PRP unit and having the PRP unit generate a DMA command set based on the PRP of the selected read command.

According to an eleventh method for controlling data readout to a host according to the first aspect of the present application, there is provided a method for controlling data readout to a host according to the twelfth aspect of the present application, selecting a read command having a first characteristic for a first SGL unit; other read commands than the read command having the first characteristic are selected for the second SGL unit.

According to the eleventh or twelfth method for controlling data readout to a host of the first aspect of the present application, there is provided the thirteenth method for controlling data readout to a host of the first aspect of the present application, selecting a read command having a first characteristic for the first PRP unit; a read command other than the read command having the first characteristic is selected for the second PRP unit.

According to a sixth method for controlling data readout to a host according to the first aspect of the present application, there is provided the method for controlling data readout to a host according to the fourteenth aspect of the present application, wherein when data transfer indicated by one DMA command is finished, or when data transfer indicated by one DMA command group is finished, a notification of the end of data transfer is generated; and according to the notification of the data moving end, when recognizing that the data indicated by all the DMA command groups corresponding to the first read command are moved, generating a notification of the completion of the execution of the first read command, and releasing the space of the DMA command group corresponding to the first read command in the shared memory.

According to a fourteenth method for controlling data read out to a host according to the first aspect of the present application, there is provided the fifteenth method for controlling data read out to a host according to the first aspect of the present application, wherein when recognizing that data indicated by all DMA command groups corresponding to a plurality of read commands are completely moved, a notification corresponding to completion of execution of the plurality of read commands is generated, and the notification of completion of execution of the plurality of read commands is collectively sent to the host.

According to a second aspect of the present application, there is provided a controller for controlling data readout to a host according to the second aspect of the application, comprising: an SGL circuit and/or a PRP circuit, a read control circuit, and a shared memory, the SGL circuit to: responding to the received read command, and acquiring an SGL corresponding to the read command; generating at least one DMA command group from the SGL, each DMA command group including at least one DMA command; storing the set of DMA commands in a shared memory; the PRP circuit is to: responding to the received read command, and acquiring a PRP corresponding to the read command; generating at least one DMA command group according to the PRP, each DMA command group including at least one DMA command; storing the set of DMA commands in a shared memory; the read control circuit is configured to: obtaining a stored set of DMA commands from the shared memory; and responding to the received data corresponding to a certain DMA command group, determining a corresponding host memory address according to the DMA command group, and moving the data to the host. The shared memory is to: the DMA command set is stored.

A controller according to a second aspect of the present application for controlling data read-out to a host, the SGL circuit comprising a plurality of parallel SGL branches, the PRP circuit comprising a plurality of parallel PRP branches, each SGL or PRP branch independently processing a respective read command, generating a SGL or PRP corresponding to the respective read command; and storing one or more DMA command groups corresponding to each read command into the shared memory.

A controller for controlling data readout to a host according to a second aspect of the present application, there is provided the controller for controlling data readout to a host according to the third aspect of the present application, the read control circuit including a read initiate circuit and a DMA transfer circuit; the reading initiating circuit requests the back-end module to process the reading command, so that the back-end module moves the data from the NVM to the memory of the storage device; responding to the data corresponding to the first DMA command group being moved to the memory of the storage device, and providing the index of the first DMA command group to the DMA transmission circuit; the DMA transmission circuit is connected with the reading initiating circuit; and acquiring one or more DMA commands corresponding to the DMA command group from the shared memory according to the DMA command group index received from the read initiating circuit, and carrying out data transfer according to the acquired one or more DMA commands.

A fourth controller for controlling data read to a host according to the second aspect of the present application is provided, wherein after the SGL branch generates a first DMA command group according to the SGL of the first read command to read the shared memory, the SGL branch acquires the SGL corresponding to the second read command in response to the received second read command, regardless of whether the first DMA command group is processed; generating at least one second DMA command group according to the SGL of the second read command; and/or after the PRP branch generates a third DMA command group according to the PRP of the third read command to read the shared memory, responding to the received fourth read command no matter whether the third DMA command group is processed, and acquiring the PRP corresponding to the fourth read command; at least one fourth DMA command group is generated based on the PRP of the fourth read command.

According to a fourth controller for controlling data read-out to a host according to the second aspect of the present application, there is provided a fifth controller for controlling data read-out to a host according to the second aspect of the present application, wherein a read initiate circuit processes a first read command while the SGL branch generates at least one second DMA command group according to an SGL of a second read command; and/or the read initiate circuit processes the third read command while the PRP branch generates at least one fourth DMA command group according to the PRP of the fourth read command.

A controller for controlling data read out to a host according to a third aspect of the present application provides the controller for controlling data read out to a host according to the sixth aspect of the present application, the DMA transfer circuit processing DMA commands of a plurality of DMA command groups stored in the shared memory in parallel.

A seventh controller for controlling data readout to a host according to the second aspect of the present application is provided, the read initiate circuit selecting a plurality of read commands for processing according to characteristics of the read commands.

A fourth controller for controlling data readout to a host according to the second aspect of the present application, there is provided the eighth controller for controlling data readout to a host according to the second aspect of the present application, the read initiate circuit requesting the back-end module to process the first read command; requesting the back-end module to process the second read command regardless of whether the first read command is processed by the back-end module; the reading initiating circuit requests the back-end module to process a third reading command; the back-end module is requested to process the fourth read command regardless of whether the third read command is processed by the back-end module.

According to an eighth controller for controlling data readout to a host according to the second aspect of the present application, there is provided a ninth controller for controlling data readout to a host according to the second aspect of the present application, selecting a read command for an SGL branch and generating a DMA command group by the SGL branch according to an SGL of the selected read command; a read command is selected for the PRP branch and a DMA command group is generated by the PRP branch based on the PRP of the selected read command.

A ninth controller for controlling data readout to a host according to the second aspect of the present application, there is provided a tenth controller for controlling data readout to a host according to the second aspect of the present application, selecting a read command having a first characteristic for a first SGL branch; other read commands than the read command having the first characteristic are selected for the second SGL branch.

A ninth or tenth controller for controlling data readout to a host according to the second aspect of the present application, there is provided the eleventh controller for controlling data readout to a host according to the second aspect of the present application, selecting a read command having a first characteristic for the first PRP branch; other read commands than the read command having the first characteristic are selected for the second PRP leg.

The controller for controlling data read out to a host according to the third aspect of the present application, according to the twelfth aspect of the present application, wherein a notification of data transfer completion is generated when data transfer instructed by one DMA command is completed or when data transfer instructed by one DMA command group is completed; and according to the notification of the data moving end, when the read initiating circuit recognizes that the data indicated by all the DMA command groups corresponding to the first read command are moved, generating a notification of the completion of the execution of the first read command, and releasing the space of the DMA command group corresponding to the first read command in the shared memory.

According to the ninth or twelfth controller for controlling data read out to the host according to the second aspect of the present application, there is provided the thirteenth controller for controlling data read out to the host according to the second aspect of the present application, wherein when the read initiate circuit recognizes that the data indicated by all DMA command groups corresponding to the plurality of read commands are completely moved, the read initiate circuit generates a notification corresponding to the completion of the execution of the plurality of read commands, and sends the notification of the completion of the execution of the plurality of read commands to the host together.

A controller for controlling data readout to a host according to the second aspect of the present application, there is provided a controller for controlling data readout to a host according to the fourteenth aspect of the present application, the SGL branch circuit comprising an SGL acquiring sub-circuit and an SGL parsing sub-circuit; the SGL fetch subcircuit is configured to: obtaining an SGL from a read command, or obtaining the SGL from a host according to an SGL pointer of the read command; the SGL parsing subcircuit is configured to: generating at least one DMA command group from the SGL; storing the set of DMA commands in a shared memory; the PRP branch comprises a PRP obtaining sub-circuit and a PRP parsing sub-circuit, wherein the PRP obtaining sub-circuit is configured to: acquiring a PRP from a read command, or acquiring the PRP from a host according to a PRP pointer of the read command; the PRP resolution sub-circuit is to: at least one DMA command group is generated in accordance with the PRP, the DMA command group being stored in a shared memory.

A fourteenth controller for controlling data readout to a host according to the second aspect of the present application, there is provided the fifteenth controller for controlling data readout to a host according to the second aspect of the present application, each SGL branch further comprising an SGL caching sub-circuit for caching SGLs; each PRP branch further comprises a PRP cache sub-circuit for caching PRP.

A controller for controlling data readout to a host according to the third or fourteenth or fifteenth aspect of the present application, there is provided the controller for controlling data readout to a host according to the sixteenth aspect of the present application, wherein the read initiate circuit includes a CPU.

According to a third aspect of the present application there is provided a memory according to the third aspect of the present application comprising a controller according to any one of the second aspects of the present application.

According to a fourth aspect of the present application, there is provided an electronic device according to the fourth aspect of the present application, comprising the controller of any one of the second aspects of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1A is a block diagram of a solid-state storage device of the prior art;

FIG. 1B is a schematic diagram of a control unit in the prior art;

FIG. 1C is a schematic diagram of a host command processing unit in the prior art;

FIG. 2 is a diagram illustrating mapping between SGLs and host memory address spaces;

FIG. 3 is a flow chart of a method for controlling data read out to a host according to an embodiment of the present application;

FIG. 4A is a block diagram of a circuit structure of an SGL-based circuit for processing a read command according to an embodiment of the present application;

FIG. 4B is a block diagram of a PRP-based circuit for processing a read command according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of a read command processing circuit according to an embodiment of the present application;

FIG. 6 is a block diagram of an SGL/PRP unit according to an embodiment of the present application;

fig. 7A is a schematic diagram of an SGL parser sub-circuit according to an embodiment of the present disclosure;

FIG. 7B is a diagram illustrating a read command processing based on SGL parsing according to the prior art;

fig. 7C is a schematic diagram illustrating a read command processing manner based on SGL parsing according to an embodiment of the present disclosure;

FIG. 8 is a diagram illustrating a parallel processing scheme of the SGL/PRP unit and the DMA transfer circuit;

FIG. 9 is a diagram of a parallel processing mechanism of the SGL/PRP unit, the read initiate circuit, the DMA transfer circuit, and the backend module; and

fig. 10 is a block diagram of a multi-branch SGL/PRP unit according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 3 is a flowchart illustrating a method for controlling data read to a host according to an embodiment of the present application, i.e., a method for processing a read command. As shown in fig. 3, step 301 is executed to obtain an SGL or a PRP corresponding to a read command according to the received read command. NVMe has two commands, one is an admin command for the host to manage and control the storage device, and the other is an IO command, including a read command and a write command, for controlling data transfer between the host and the storage device. The field (or field) in the IO command that relates to the SGL or PRP indicates the location of the data in the host memory (for write commands) or the host memory address to which the data needs to be written (for read commands). One IO command may transfer, for example, 128KB of data.

In addition, the IO command also contains the starting logical address (LBA) and data length of the storage device to be accessed. For a write command, after the storage device obtains data from the host memory, the data is written into the flash memory, and a mapping relationship between the logical address LBA and the physical address is generated and recorded through the FTL table. For the read command, the storage device searches the FTL table according to the LBA, finds a corresponding physical address, and obtains data from a physical block corresponding to the physical address.

The PRP field or SGL field in the IO command may be the SGL or the PRP itself, and point to the host memory address space to be accessed, or may be a pointer, point to the SGL or PRP linked list, or even may be a pointer. In either form, the storage device can always obtain the corresponding SGL or PRP according to the IO command. Specifically, in the NVMe protocol, the length of the IO command is 64 bytes, so when the size of the SGL or PRP is small (e.g., PRP is within 8 bytes and SGL is within 16 bytes), the IO command can directly carry the SGL or PRP, and when the size of the SGL or PRP is large (the size of the SGL or PRP is related to the fragmentation degree of the address space described by the SGL or PRP and/or the size of the described address space), the IO command is used to carry a pointer of the SGL or PRP.

Based on this, in one embodiment, the read command carries the SGL or PRP, which can be directly retrieved in response to receiving the read command. In another embodiment, the read command carries an SGL or PRP pointer, and in response to receiving the read command, the storage device accesses the host according to the SGL or PRP pointer to obtain the SGL or PRP from the host.

In one embodiment, the obtaining the SGL or PRP may be performed by a CPU, for example, the CPU obtains a read command from a host interface and stores the read command in a shared memory.

Next, in step 302, at least one DMA command group is generated from the SGL or PRP, each DMA command group including at least one DMA command. DMA commands are commands for controlling DMA to perform data transfers, each DMA command being used to perform a data move operation between a host and a storage device. The DMA command set indicates a mapping of host memory address space to storage device memory address space, corresponding to SGL or PRP.

DMA (Direct Memory Access) is also called a group data transfer method. DMA transfers copy data from one address space to another, providing high speed data transfer between the host and memory or between memory and memory. The transfer action itself is implemented and completed by the DMA controller. The DMA transmission mode does not need a CPU to directly control transmission, does not have an interrupt processing mode, and opens up a channel for directly transmitting data through hardware, so that the efficiency of the CPU is greatly improved.

One DMA command as described above can implement one data transfer by the DMA technique. The DMA command includes a host memory address and a storage device memory address (e.g., a DRAM address). For a read command, the host memory address is the destination address and the storage device memory address is the source address. The host memory address contained in the DMA command is determined according to the address space described by the SGL or the PRP, the memory address of the storage device is allocated by the storage device, and the storage device can allocate a continuous memory address space of the storage device, so as to improve the processing efficiency of the DMA. It should be noted that the memory address of the storage device is independent of the LBA; the present application only focuses on data transmission between the host memory and the storage device memory, and does not focus on data transmission between the storage device memory and the flash memory (which may be denoted as LBA), in other words, the technical innovation of the present application is not data transmission between the storage device memory and the flash memory; data transfer between a memory of a storage device and a flash memory belongs to the category of the prior art.

In one application scenario, the DMA command group may be a DMA command list, with one DMA command list listing a plurality of DMA commands. A DMA command list is used to manipulate data of a specified length. In the example of the present application, the specified length may be one data frame size, i.e., 4 KB; the size of 4KB can be called a Data Transfer Unit (DTU) corresponding to each entry of the FTL table; and when the storage device processes the read command, initiating data transmission according to the DTU as a unit.

Based on this, one read command may include one or more DMA command groups. For example, a read command indicates a data size of 4KB, which corresponds to a DMA command group. As another example, a read command indicates a data size of 12KB, which corresponds to 3 DMA command groups.

In another application scenario, the data size indicated by the DMA command group may also be less than the length of one DTU. For example, a read command indicates a data size of 13KB, which corresponds to 4 DMA command groups, the first 3 DMA command groups indicating a data size of 4KB, and the 4 th DMA command group indicating a data size of 1 KB. That is, the data size indicated by the DMA command group is a fixed value (4KB), or the data size indicated by the read command modulo the fixed value, i.e., 13KB modulo 4KB, is 1 KB.

One or more DMA commands constitute a group of DMA commands, each of which indicates a data size that is not fixed. For example, one DMA command group is made up of 4 DMA commands, each indicating 1KB of data; as another example, a group of DMA commands consists of 5 DMA commands, one DMA command indicating 1KB of data and four other DMA commands indicating 0.5KB of data. The data size indicated by a DMA command is related to the size of the address space described by the corresponding SGL descriptor, e.g., SGL indicates 60 address spaces of 1KB, then a DMA command group may include 4 DMA commands indicating 1KB of data; as another example, where the SGL indicates 30 address spaces of 2KB, then a DMA command group includes 2 DMA commands indicating 2KB of data.

Next, in step 303, the set of DMA commands is stored in shared memory. Shared memory, or share memory, is distinct from memory devices memory (DRAM) and flash memory (NVM).

In one embodiment, data (e.g., a DTU) corresponding to a DMA command group needs to occupy memory space (e.g., DRAM) of a storage device, and thus after the DMA command group is processed, the memory space of the storage device needs to be released. In addition, the read command and the corresponding DMA command group are stored in the shared memory, and after a certain read command is finished, the shared memory space occupied by them needs to be released.

Finally, in step S304, in response to receiving the data indicated by a certain DMA command group, the data is transferred to the host according to the DMA command group. When data is read from the flash memory, the stored DMA command group is obtained from the shared memory according to the DMA command group corresponding to the data, and the data is moved to the host according to the DMA command group. For example, a read command accesses 18KB of data, denoted as LBA: 0-17KB, the resulting DMA command set includes 5, the first 4 DMA command sets correspond to 4KB of data, corresponding to LBAs: 0-3KB, 4-7KB, 8-11KB, 12-15KB, with the 5 th DMA command set corresponding to 2KB of data, corresponding to LBA: 16-17 KB. When data of one DTU is read out from the flash memory, for example, the DTU corresponds to LBA:12-15KB, the DTU corresponding to the 4 th DMA command group can be immediately obtained without complicated operation; and then acquiring a fourth DMA command group from the shared memory, and executing data migration according to the DMA command of the fourth DMA command group. And analogizing in sequence, when data of a DTU needing to be accessed is read out from the flash memory, determining a DMA command group corresponding to the DTU, and then executing data transfer until the data of the last DTU is transferred, and then executing a read command.

According to the steps S301 to S304, the overhead of the CPU is reduced by analyzing the SGL or the PRP through hardware independent of the CPU. The specific hardware circuitry that implements the read command processing described above will be further described below.

Fig. 4A shows a circuit structure of an SGL-based circuit for processing a read command (i.e., a host reads data from a storage device), i.e., a host command processing unit (e.g., the host command processing unit in fig. 1B). In the embodiment shown in FIG. 4A, the processing of the SGL/PRP is implemented using hardware circuitry that is independent of the CPU.

As shown in fig. 4A, the host command processing unit includes a shared memory, an SGL unit, and a read control circuit. The SGL unit is used for: responding to the received read command, and acquiring an SGL corresponding to the read command; generating at least one DMA command group according to the SGL, each DMA command group including at least one DMA command; storing the set of DMA commands in a shared memory; the read control circuit acquires a stored DMA command group from the shared memory; responding to the received data corresponding to a certain DMA command group, determining a corresponding host memory address according to the DMA command group, and moving the data to the host; the shared memory is to: and storing the DMA command group corresponding to the read command. In one embodiment, the shared memory may also be used to store read commands.

FIG. 4B shows a circuit structure for processing a read command based on PRP, which includes: shared memory, PRP unit and read control circuit. The PRP unit is to: responding to the received read command, and acquiring a PRP corresponding to the read command; generating at least one DMA command group according to the PRP, each of the DMA command groups including at least one DMA command; the set of DMA commands is stored in the shared memory. A read control circuit connected to the PRP unit for: carrying out data transfer according to the DMA command group; the shared memory is to: and storing the DMA command group corresponding to the read command. Fig. 4B differs from fig. 4A only in that the SGL unit in fig. 4A is replaced with a PRP unit for processing the PRP corresponding to the read command.

On the basis of the embodiments of fig. 4A and 4B, the host command processing unit may also include both an SGL unit and a PRP unit. When the read command carries the PRP or the PRP pointer, the PRP unit processes the read command, and when the read command carries the SGL or the SGL pointer, the SGL unit processes the read command. In one embodiment, a CPU (not shown) may participate in this process to identify the type of read command; for example, the CPU extracts the PRP/SGL field in the read command and provides the PRP/SGL field to the SGL unit or the PRP unit; that is, the CPU may identify whether the read command corresponds to the SGL or the PRP through a corresponding field of the read command. In other embodiments, hardware circuits of the SGL unit and the PRP unit (e.g., the SGL acquisition sub-circuit and the PRP acquisition sub-circuit in fig. 6) may also be used to implement the identification of the SGL and PRP types.

By way of example, FIG. 5 illustrates a read command processing circuit that includes both an SGL unit and a PRP unit, i.e., a controller that controls the reading of data from a memory device; wherein the SGL unit and the PRP unit are denoted as SGL/PRP unit. It should be noted that the circuit structure shown in fig. 5 is also applicable to fig. 4A and 4B, and as shown in the circuit shown in fig. 5, the read control circuit specifically includes a read initiation circuit and a DMA transfer circuit, and the read initiation circuit and the DMA transfer circuit cooperate with each other to implement control of data transfer to the host. The processing of a read command is described in detail below.

As shown in FIG. 5, the host transfers a read command to the storage device through the host interface, which transfers the read command to the shared memory for storage, which is represented as process (1). The CPU (not shown) extracts the PRP/SGL field in the read command and provides the read command to the SGL/PRP unit, which is represented as process (2). Taking the processing procedure of the SGL unit as an example (the processing procedure of the PRP unit is the same, so the processing procedure of the PRP unit is not described any more), if the read command carries the SGL, caching the SGL in a cache unit, and if the read command carries the SGL pointer, acquiring the SGL from the host through the host interface and caching the SGL in the cache unit, which is represented as procedure (3); next, one or more DMA command groups are generated based on information described by the one or more SGL descriptors in the SGL, and the DMA command groups are stored in shared memory, represented as process (4). After the DMA command group generation is complete, the SGL unit notifies the read initiate circuit, which is represented as process (5), which passes a DMA command group index, e.g., a DMA command group pointer, indicating the location of the DMA command group in shared memory to the read initiate circuit.

The read initiate circuit receives the DMA command group index. Meanwhile, the read initiator circuit accesses a back-end module (the back-end module refers to other devices in the storage device, such as the storage command processing unit and the media interface controller in fig. 1B), and requests the back-end module to read out data indicated by the read command from the flash memory (NVM) to the memory (DRAM) of the storage device, which is represented as a process (6). The read initiator circuit waits for the backend module to read out the data indicated by the DMA command group into the memory of the storage device, and when the data indicated by a certain DMA command group is read out into the memory of the storage device, the read initiator circuit can acquire the information, for example, the backend module may notify the read initiator circuit, or the read initiator circuit may acquire the information according to the state of the memory of the storage device, which is represented as a process (7). The read initiate circuit, in response to the read of data indicated by a certain DMA command group into the memory of the storage device, may provide the DMA command group index to the DMA transfer circuit, which is represented as process (8). The DMA transfer circuit fetches one or more DMA commands of the DMA command group from the shared memory according to the DMA command group index, which is indicated as process (9-1), while the DMA transfer circuit performs a data transfer operation to transfer data from the storage device memory to the host memory, which is indicated as process (9-2).

When the data transfer indicated by one DMA command ends and when the data transfer indicated by one DMA command group ends, a notification of the end of the data transfer is generated, which is represented as a process (10). In procedure (5), the read initiate circuit obtains a read command ID for identifying the read command in addition to the DMA command group index. In one embodiment, after a certain DMA command group is processed, corresponding information (for example, a read command ID to which the certain DMA command group belongs is also included) is fed back to the read initiator circuit, and the read initiator circuit can thereby identify which DMA command group corresponds to. For example, if a read command contains 3 DMA command sets, labeled 1#, 2# and 3 #. When each of the DMA command groups 1#, 2# and 3# is processed, the read initiate circuit is notified accordingly. For example, in an actual scenario, the processing for each DMA command group is executed in the order of 1#, 3#, and 2#, and after the 2# is processed, the read control circuit can determine that all the 3 DMA command groups corresponding to the read command are processed according to the read command ID, and generate a notification of the completion of the execution of the read command to notify the host, which is indicated as a process (11). In another practical scenario, the processing of each DMA command group is executed in the order of # 1, # 2, and # 3, and after # 3 is processed, the read control circuit can determine that all the 3 DMA command groups corresponding to the read command are processed according to the read command ID, and generate a notification of the completion of the execution of the read command. According to the NVMe protocol, this notification may be implemented by operating a CQ queue. While notifying the host, the shared memory may free up the space in the shared memory for the read command (if the read command is stored in the shared memory) and the DMA command group (1#, 2# and 3#) corresponding to the read command.

In the embodiment shown in fig. 5, after the DMA command group is stored, the read initiator circuit can know that a certain DMA command group is stored, and at this time, the SGL unit notifies the read initiator circuit that a new DMA command group is written into the shared memory. In other embodiments, the read initiate circuit may also be notified by other circuits by detecting the data storage status in the shared memory.

It should be noted that, in the circuit shown in fig. 5, the cache unit is used for caching the SGL or the PRP, and in some embodiments, the cache unit may be omitted according to the processing speed of the SGL unit or the PRP unit. In addition, the SGL/PRP unit in fig. 5 is implemented by a hardware circuit other than the CPU, and the read initiator circuit may be implemented by the CPU or a hardware circuit other than the CPU, which may be the CPU in fig. 1C.

FIG. 6 shows the structure of the SGL/PRP unit of FIG. 5, which includes an SGL unit and a PRP unit; the SGL unit includes one SGL branch, and the PRP unit includes one PRP branch.

The SGL branch circuit comprises an SGL acquisition sub-circuit, an SGL cache sub-circuit and an SGL analysis sub-circuit, wherein the SGL acquisition sub-circuit is used for acquiring the SGL or acquiring the SGL from a host according to an SGL pointer of the read command; the SGL caching sub-circuit is used for caching the SGL; and the SGL parsing subcircuit is used for generating at least one DMA command group according to the SGL and storing the DMA command group in the shared memory.

The PRP branch circuit comprises a PRP obtaining sub-circuit, a PRP caching sub-circuit and a PRP analyzing sub-circuit, wherein the PRP obtaining sub-circuit is used for obtaining the PRP or obtaining the PRP from a host according to a PRP pointer of the read command; a PRP cache sub-circuit for caching PRP; and the PRP analysis subcircuit is used for generating at least one DMA command group according to the PRP and storing the DMA command group in the shared memory.

The multiplexer is used for connecting the reading initiating circuit. After the DMA command group store is complete, the multiplexer connects either the PRP resolution sub-circuit to the read initiation circuit or the SGL resolution sub-circuit to the read initiation circuit to output a notification of the store completion to the read initiation circuit. As can be seen from the description of FIG. 5, when the DMA command group store is complete, the read initiate circuit may also be notified by other circuits, and thus in other embodiments, the multiplexer may be omitted.

The PRP obtaining sub-circuit, the PRP analyzing sub-circuit, the SGL obtaining sub-circuit, and the SGL analyzing sub-circuit may be implemented by hardware circuits, for example, such hardware circuits may be generated by a hardware description language and a corresponding process.

As can be seen from fig. 6, the SGL/PRP unit includes both SGL and PRP units, so the SGL/PRP unit can handle both SGL and PRP. For example, the SGL/PRP unit may process a read command related to SGL and a read command related to PRP at the same time, or the SGL/PRP unit may process a read command related to SGL alone or a read command related to PRP alone; if the SGL/PRP unit processes the read command related to the SGL alone, the PRP unit in the SGL/PRP unit does not operate, and if the SGL/PRP unit processes the read command related to the PRP alone, the SGL unit in the SGL/PRP unit does not operate.

As another example, in some scenarios, only the read command related to the SGL or the read command related to the PRP needs to be processed, and in order to reduce the cost and save hardware resources, the SGL/PRP unit shown in fig. 6 may be modified to obtain another SGL/PRP unit structure, where only the SGL unit or the PRP unit is reserved in the SGL/PRP unit structure, and other structures and connection relationships in the read command processing circuit remain unchanged. Specifically, the improved SGL/PRP unit and the corresponding read command processing circuit thereof are not described in detail herein. In the embodiment shown in fig. 6, not only the SGL but also the PRP can be analyzed, that is, a unified way is provided to process the SGL or the PRP, and the CPU does not need to care about the difference between the SGL and the PRP, thereby reducing the complexity of the processing.

Fig. 7A shows the principle of the SGL resolution sub-circuit (the same principle of the PRP resolution sub-circuit, so it will not be described in detail here). Fig. 7A, which employs the SGL-in-fig. 2 example, may be viewed in conjunction with fig. 2.

As shown in FIG. 7, the SGL corresponding to a read command includes SGL descriptor 0-1, SGL descriptor 1-2, SGL descriptor 1-3, SGL descriptor 2-1 and SGL descriptor 2-2; the SGL descriptors sequentially correspond to a plurality of address spaces in the host memory, and are referred to as a storage block a, a storage block B, a storage block C, a storage block D, a storage block E, and a storage block F. The SGL parsing sub-circuit will generate several DMA command groups based on the SGL descriptors described above.

For example, first, the size of the memory block a corresponding to the SGL descriptors 0-1 is 3KB, and the address corresponding to the memory block a is included in the first DMA command group 1; the size of memory block B corresponding to SGL descriptor 1-1 is 2KB, and the address corresponding to a portion of memory block B-1(1KB) of memory block B is also included in first DMA command group 1, thereby generating a DMA command group 1(4 KB). DMA command set 1 includes two DMA commands: DMA command 1 and DMA command 2, DMA command 1 being used to transfer 3KB of data and DMA command 2 being used to transfer 1KB of data.

Then, the addresses corresponding to the remaining partial memory block B-2(1KB) of memory block B are included in the second DMA command group 2; the corresponding addresses of the SGL descriptors 1-2 (memory block C, 2KB) and 1-3 (memory block D, 1KB) are then included in the second DMA command group 2, and the second DMA command group 2 is completed. DMA command set 2 includes three DMA commands: DMA command 3, DMA command 4, and DMA command 5, DMA command 3 being used to move 1KB of data, DMA command 4 being used to move 2KB of data, and DMA command 5 being used to move 1KB of data.

Next, the address corresponding to SGL descriptor 2-1 (memory block E, 4KB) is included in the third DMA command group 3, and the generation of the third DMA command group 3 is complete. DMA command set 3 includes one DMA command, DMA command 6, for moving 4KB of data.

Finally, the address corresponding to SGL descriptor 2-2 (memory block F, 1KB) is included in the fourth DMA command group 4, and the generation of the fourth DMA command group 4 is complete. The DMA command group 4 includes one DMA command, DMA command 7, for moving 1KB of data. A total of 4 DMA command groups are generated. The data size indicated by the DMA command group 1, the DMA command group 2, and the DMA command group 3 is 4KB, and the data size indicated by the DMA command group 4 is 1 KB. The data size indicated by each DMA command is different and related to the size of the address space described by the SGL descriptor, for example, the data size indicated by DMA command 1 is the same as the size of the address space described by SGL descriptors 0-1, and the data size indicated by DMA command 2 is different from the size of the data described by SGL descriptors 0-2. In general, the data size indicated by the DMA command is less than or equal to the size of the address space described by the SGL descriptor.

Fig. 7A also shows that, when a DMA command group is generated by parsing in the SGL, a memory device memory space (DRAM space) is allocated to the DMA command group. Thus, the DMA command set indicates a mapping between the host memory address space and the storage device memory address space. For read commands, DMA commands are used to move data in the storage device memory to host memory. As can be seen from FIG. 7A, the present application splits an indefinite length read command into multiple DMA command sets of fixed length (only the last DMA command set may be smaller than the fixed length value), for example, each memory command indicates 4KB of data, which may improve the convenience and the alignment of the read command processing.

FIG. 7B shows a prior art read command processing. As shown in fig. 7B, a read command X includes a host memory address and a logical address (LBA), the host memory address is expressed by SGL, and the range of logical addresses is represented as LBA: 0-16KB, read command X requires 17KB of data to be read from the memory device. Depending on the size of the physical page (e.g., 4KB), the back-end module will divide read command X into 5 DTUs, such as DTUs 0-DTU4 in 7B, when processing it. Wherein, DTU0 has LBA 0-3 KB; DTU1, the corresponding logical address is not LBA:4-7 KB; DTU2, the corresponding logical address is not LBA:8-11 KB; DTU3, the corresponding logical address has no LBA:12-15 KB; DTU4, the corresponding logical address not LBA:16 KB; that is, the first four DTUs are 4KB in size, and the last DTU is 1KB in size.

For the read command, since the storage medium includes a plurality of physical units, and the operating status of each physical unit is different, it is unpredictable which DTU will be processed first and which DTU will be processed later. For example, for a read command X, DTU2 is first processed, i.e., read from the storage medium into the storage device memory. At this time, according to the prior art scheme, the SGL needs to be traversed to determine the host memory address corresponding to DTU 2. In fig. 7B, the SGL is represented as a plurality of linked list nodes (corresponding to SGL descriptors in fig. 7A) e1, e2, e3, e4, e5, e6, e7, e8, e 9. When traversing the SGL, the storage space size of each node is calculated from e1, and since the logical address corresponding to DTU2 is 8-11KB, until e4 is calculated, it is found that e1(1KB) + e2(3KB) + e3(512B) + e4(3.5KB) ═ 8KB, so that the host memory address corresponding to DTU2 from e5 is known, and based on e5(2KB) + e6(1KB) + e7(1KB) ═ 4KB, the host memory address space corresponding to DTU2 includes the host memory address spaces indicated by nodes e5, e6 and e 7. As can be seen from fig. 7B, when the middle DTU and the later DTUs are read in a certain logical address range, complicated calculation is required to calculate the host memory address, which results in low processing efficiency of the entire read command.

FIG. 7C illustrates the manner in which the present application processes a read command. As shown in fig. 7C, the DMA command group therein is obtained according to the SGL in fig. 7B (e.g. obtained by parsing through the SGL/PRP unit in fig. 5, the parsing process can refer to fig. 7A), i.e. five DMA command groups L0, L1, L2, L3 and L4 are obtained according to the SGL, wherein L0-L3 are 4KB and L4 is 1 KB. When the DTU2 in the read command X is first read out from the storage medium into the memory of the storage device, the read initiator circuit (as shown in fig. 5) knows that DTU2 corresponds to DMA command group L2 by simple calculation, for example: 8(DTU2 logical head address)/4 (fixed DTU size) ═ 2, sequence number L2 for the DMA command group. The data indicated by DMA command group L2 is then moved to the address spaces indicated by e5, e6, and e7 in the host memory according to the DMA commands contained in DMA command group L2. As can be seen from fig. 7C, the scheme of the present application avoids a complex calculation process, and improves the overall processing efficiency of the read command.

For example, after the SGL/PRP unit processes the first read command to generate one or more DMA command groups and stores the one or more DMA command groups in the shared memory, the read control circuit performs data migration according to the DMA command group corresponding to the first NVMe command, and the SGL/PRP unit processes the next second NVMe command and generates one or more DMA command groups, that is, the read control circuit performs data migration by executing the first NVMe command and the SGL/PRP unit processes the second NVMe command to generate the DMA command groups simultaneously.

Fig. 8 shows a parallel processing scheme of the SGL/PRP unit and the DMA transfer circuit, in which different filling background patterns are used to indicate different stages of each module. As shown in fig. 8, three read commands, i.e., a read command a, a read command b, and a read command c, are stored in the shared memory.

In one embodiment, read command a has generated DMA command group a-1, DMA command group a-2, and DMA command group a-3 prior to the current time; read command c has generated DMA command set c-1, DMA command set c-2, and DMA command set c-3; read command b has generated DMA command group b-1. Moreover, the data indicated by the DMA command group a-1, the DMA command group a-2, and the DMA command group c-1 has also been moved by the DMA transfer circuit. The data indicated by DMA command set b-1, DMA command set c-2 is waiting to be processed.

As shown in FIG. 8, at the present time, the SGL/PRP unit is processing read command b to generate DMA command group b-2 (the blank portion to the right of which indicates that more DMA command groups, e.g., DMA command group b-3, DMA command group b-4, may also be generated from read command b). At the same time, the DMA transfer circuit is moving the data indicated by the DMA command group a-3 and the DMA command group c-3. While the data indicated by DMA command set b-1, DMA command set c-2 is waiting to be processed.

Therefore, on the one hand, the generation of the DMA command group corresponding to the read command by the SGL/PRP unit and the transfer of data by the DMA transmission circuit are performed in parallel. In other words, after the SGL/PRP unit generates DMA command group a (DMA command group a-1, a-2 or a-3) according to read command a and writes the DMA command group a into the shared memory, whether DMA command group a is completely processed by the DMA transfer circuit or not, the SGL/PRP unit does not affect the reception of read command b and read command c, and continues to generate corresponding DMA command group b (DMA command group b-1, b-2 or b-3) and DMA command group c (DMA command group c-1, c-2 or c-3) according to read command b and read command c.

On the other hand, the DMA transfer circuit does not have the sequence when processing DMA commands, and does not distinguish whether these DMA commands belong to a certain read command. Also, the DMA transfer circuit may concurrently process multiple sets of DMA commands in parallel (as in FIG. 8, DMA command set a-3 and DMA command set c-3 are being processed concurrently).

In yet another aspect, the shared memory serves as a data cache, in which multiple read commands and DMA command groups corresponding to the multiple read commands may be stored simultaneously.

FIG. 9 illustrates a parallel processing mechanism of the SGL/PRP unit, the read initiate circuit, the DMA transfer circuit, and the back-end module. In fig. 9, T0-T7 represent successive time periods, and the content below each time period represents the content of the operation performed by the respective cell in that time period.

During time period T0, where read command 1 is stored in the shared memory, the SGL/PRP unit generates DMA command group 1, DMA command group 2, and DMA command group 3 from read command 1 (for ease of reading, successive DMA command groups are shown as DMA command groups 1, 2, 3, and in a similar manner below). Since the SGL/PRP unit is processing read command 1 to generate the DMA command group for the T0 time period, the T0 time period read initiate circuit, back end module, and DMA transfer circuit are in an idle state.

During time period T1, the shared memory stores pending read command 2, and the SGL/PRP unit generates DMA command groups 4, 5, 6, 7 based on read command 2. Since the SGL/PRP unit has completed processing read command 1 to generate DMA command groups 1, 2, 3 for time period T0, the read initiate circuit starts accessing the back end module for time period T1, and waits for the data of read command 1 to be read out to the DRAM; the back-end module starts processing read command 1 and the DMA transfer circuit remains idle provided that no data has been read at this time.

In time period T2, the shared memory stores pending read command 3, and the SGL/PRP unit generates DMA command groups 8, 9, 10 from read command 3. Since the SGL/PRP unit processed read command 2 at time period T1, DMA command groups 4, 5, 6, 7 are generated; the back-end module therefore also starts processing read command 2 and at this point the data of DMA command group 1 is read out into the DRAM, in response to which the read initiate circuit initiates the processing of DMA command group 1 and the DMA transfer circuit starts processing DMA command group 1.

In time period T3, the shared memory stores pending read command 4, and the SGL/PRP unit generates DMA command groups 11, 12, 13 … 100 based on read command 4. Since the SGL/PRP unit processed read command 3 generated DMA command groups 8, 9, 10 at time period T2, the back-end module also starts processing read command 3 and at this time the data of DMA command group 2 is read out into the DRAM, in response to which the read initiate circuit initiates the processing of DMA command group 2 and the DMA transfer circuit starts processing DMA command group 2.

During time period T4, no new read command is stored in the shared memory, so the SGL/PRP unit is in a wait state waiting for a new read command to arrive. Since the SGL/PRP unit processed read command 3 generated DMA command groups 11-100 at time period T3, the back-end module also starts processing read command 4, and at this time the data of DMA command group 4 is read out into the DRAM, in response to which the read initiate circuit initiates the processing of DMA command group 4 and the DMA transfer circuit starts processing DMA command group 4.

During time period T5, no new read command is yet stored in the shared memory, and therefore the SGL/PRP unit is in a wait state waiting for a new read command to arrive. The back end module still processes read commands 1, 2, 3, 4 at this time. Meanwhile, since the data of the DMA command group 3 is read out into the DRAM, in response to which the read initiate circuit initiates the processing of the DMA command group 3, the DMA transfer circuit starts processing the DMA command group 3.

During time period T6, the shared memory stores pending read command 5, and the SGL/PRP unit generates DMA command group 100-. At this time, the data of the DMA command group 7 is read out into the DRAM, and in response to this, the read initiate circuit initiates the processing of the DMA command group 7, and the DMA transfer circuit starts processing the DMA command group 7. In addition, since the DMA transfer circuit completes processing of the DMA command group 3 in the time period T5, the read initiator circuit further recognizes that all the processing of the DMA command groups 1, 2, and 3 corresponding to the read command 1 is completed, and may send a notification that the processing of the read command 1 is completed to the host interface; also, since read command 1 is processed, the back end module no longer processes read command 1.

During time period T7, no new read command is stored in the shared memory, so the SGL/PRP unit is in a wait state waiting for a new read command to arrive. Since the SGL/PRP unit generates the DMA command groups 100, 101 … 110 from the read command 5 at time period T6, the back-end module also starts processing the read command 5, and at this time the data of the DMA command group 8 is read out into the DRAM, in response to which the read initiate circuit initiates the processing of the DMA command group 8 and the DMA transfer circuit starts processing the DMA command group 8. Since only read command 1 is processed, the back end module is in the process of processing read commands 2, 3, 4, 5.

As can be seen from the context of FIG. 9, first, the read initiate circuit responds to the order in which data is read out into the DRAM, regardless of the order of the read commands or the order of the DMA command set. Second, the DMA transfer circuit processes the specified DMA command set in response to the read initiate circuit, which does not care which read command the processed DMA command set corresponds to. Thirdly, when the SGL/PRP unit generates a certain DMA command group according to the read command, the read initiating circuit processes and initiates the processing of another DMA command group; that is, the SGL/PRP unit and the read initiate circuit are also processed in parallel.

When the data transfer indicated by one DMA command is finished or when the data transfer indicated by one DMA command group is finished, the DMA transfer circuit generates a notification of the end of the data transfer. For example, when the data transfer of the DMA command group 1, the DMA command group 2, the DMA command group 4, and the DMA command group 3 is completed, the read initiator circuit receives the corresponding notification. In the time period T5, the read initiate circuit receives the notification that the data transfer of the DMA command group 1, the DMA command group 2, and the DMA command group 4 is completed, and at this time, the read initiate circuit may determine that both the read command 1 and the read command 2 are not completed. And in a time period of T6, the read initiation circuit receives the notification of the end of the data transfer of the DMA command group 3, and the read initiation circuit recognizes that the DMA command group 1, the DMA command group 2, and the DMA command group 3 belong to the same read command 1 and are all DMA command groups of the read command 1. In one embodiment, the read initiate circuit may notify the host of the completion of the execution after the completion of the execution of each read command (e.g., read command 1, read command 2, or read command 3), and free the space in the shared memory for the read command and the set of DMA commands corresponding to the read command. In another embodiment, the read initiator circuit may recognize that the data indicated by all DMA command groups corresponding to the read commands are completely transferred, and at this time, the read initiator circuit generates a notification that all the read commands (e.g., read command 1, read command 2, and read command 3) are completely executed, and sends the notification that all the read commands are completely executed to the host.

As can be seen from the description of fig. 8 and fig. 9, the parallel processing characteristic of the embodiment of the present application further ensures the processing efficiency of the IO command, that is, the SGL/PRP generation does not conflict with the data transfer, and the SGL/PRP generated and stored can be directly used during the data transfer without any delay; moreover, the parallel processing characteristic does not require that one read command can be processed next after being completely processed, so that the conflict between the read commands is avoided, and each read command can be responded in relative time.

FIG. 10 shows a multi-branch SGL/PRP unit structure. Fig. 10 differs from fig. 6 in that the SGL unit includes two parallel SGL branches, the PRP unit includes two parallel PRP branches, and each SGL or PRP branch independently processes a respective read command to generate a SGL or PRP corresponding to the respective read command; and storing one or more DMA command groups corresponding to each read command into the shared memory. The SGL and PRP legs are the same as the structure shown in fig. 6. In a practical application scenario, the number of SGL branches that an SGL unit can contain may be 3, 4, 5 or even more. The specific number of SGL branches is determined by the product requirements and cost that can be combined. Similarly, the PRP unit may also include more PRP branches. The multiple read commands are processed simultaneously in parallel in a multi-branch mode, so that the possibility of conflict caused by resource preemption among multiple read command processes can be reduced.

More SGL or PRP legs enable the circuit to service more read commands simultaneously. In a practical application scenario, the multi-branch embodiment shown in fig. 10 is suitable for a network service scenario in which a server simultaneously serves multiple user requests. Under such a condition, not only the parallel processing manner described in fig. 8 and fig. 9 above is also applicable to the embodiment shown in fig. 10, but also a read command with specific characteristics can be selected to be processed by a specified branch, that is, an SGL/PRP generated by a certain SGL/PRP branch can be preferentially selected to be processed, which is equivalent to providing priority or QoS (Quality of Service) for each read command.

In one embodiment, when several branches (including SGL branch and/or PRP branch) work in parallel, each branch processes one read command, and one read command may be processed by only one branch, and each read command has exclusive space in the shared memory. For example, the DMA command group generated by the branch 1 and the DMA command group generated by the branch 2 occupy different spaces, respectively, thereby avoiding the contention of the branch concurrent work for the shared memory resource.

In another embodiment, there are two branches, with three read commands coming in; of these three read commands, the first two read commands have the agreed upon characteristics, and the third read command does not. Then, the two branches respond to the first two read commands preferentially, ignore the third read command, or respond to the third read command after completing the processing of the first two read commands; it is also possible that one of the branches processes the first two read commands and the other branch processes the third read command.

In yet another embodiment, there are 10 branches, and 5 branches may be assigned to VIP user a, 4 branches to VIP user B, and the remaining 1 branch used by all other ordinary users. Therefore, the VIP user can be guaranteed to have priority on most branches, all branches are prevented from being occupied by the VIP user, and the QoS capability is provided for the controller. For example, the user identity may be associated with a read command, and when receiving the read command, the controller may determine whether the read command is from a VIP user or a normal user according to the user identity, and may further assign the read command to a corresponding branch.

According to an aspect of the present application, embodiments of the present application further provide a memory, which refers to the memory device 102 shown in fig. 1A and 1B, the memory device 102 includes an interface 103, a control unit 104, one or more NVM chips 105, and a DRAM 110. The host command processing unit 1042 is included in the control unit, and the host command processing unit 1042 adopts the circuit described in the above embodiments, which is not described in detail herein since the circuit has been described in detail above.

According to an aspect of the present application, an electronic device is further provided in an embodiment of the present application, where the electronic device includes a processor and a memory, and the memory is the memory mentioned in the above embodiments. Since it has been described in detail above, it will not be described in detail.

It is noted that for the sake of brevity, this application describes some methods and embodiments thereof as a series of acts and combinations thereof, but those skilled in the art will appreciate that the aspects of the application are not limited by the order of the acts described. Accordingly, one of ordinary skill in the art will appreciate that certain steps may be performed in other sequences or simultaneously, in accordance with the disclosure or teachings herein. Further, those skilled in the art will appreciate that the embodiments described herein are capable of alternative embodiments, i.e., acts or modules referred to herein are not necessarily required for the implementation of the solution or solutions described herein. In addition, the description of some embodiments of the present application is also focused on different schemes. In view of the above, those skilled in the art will understand that portions that are not described in detail in one embodiment of the present application may also be referred to in the related description of other embodiments.

In particular implementation, based on the disclosure and teachings of the present application, one of ordinary skill in the art will appreciate that the several embodiments disclosed in the present application may be implemented in other ways not disclosed herein. For example, as for the units in the foregoing embodiments of the electronic device or apparatus, the units are split based on the logic function, and there may be another splitting manner in the actual implementation. Also for example, multiple units or components may be combined or integrated with another system or some features or functions in a unit or component may be selectively disabled. The connections discussed above in connection with the figures may be direct or indirect couplings between the units or components in terms of connectivity between the different units or components. In some scenarios, the aforementioned direct or indirect coupling involves a communication connection utilizing an interface, where the communication interface may support electrical, optical, acoustic, magnetic, or other forms of signal transmission.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

29页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：语音记录方法、装置、充电坞以及存储介质

Method for controlling data to be read out to host and controller

相关技术

网友询问留言