NVMe solid state disk exception handling method and device and integrated chip

文档序号:784757 发布日期:2021-04-09 浏览:8次 中文

阅读说明:本技术 一种NVMe固态硬盘异常处理方法、装置及集成芯片 (NVMe solid state disk exception handling method and device and integrated chip ) 是由 刘海亮 黄锐 汪再金 刘洋 于 2020-12-23 设计创作,主要内容包括:本申请公开了一种NVMe固态硬盘异常处理方法、装置及集成芯片,该方法包括:当NAND FLASH控制器检测到直接传输出现错误,将与NVMe控制器的传输接口信号data-trans-err信号置为1,并将出现错误的信息上报CPU;当NVMe控制器检测到data-trans-err信号为1,将出现错误的数据从NAND FLASH控制器中读出,并发送与NAND FLASH控制器的握手信号,然后在NVMe控制器内部丢弃数据;当CPU收到信息,调用RAID模块对数据进行恢复。本申请对传输错误的情况标记和处理恢复,避免读操作流程因该错误卡顿或停止,降低了错误出现对数据传输的影响。(The application discloses an NVMe solid state disk exception handling method, device and integrated chip, wherein the method comprises the following steps: when detecting that an error occurs in direct transmission, the NAND FLASH controller sets a transmission interface signal data _ trans _ err signal of the NVMe controller to be 1, and reports the error information to the CPU; when the NVMe controller detects that the data _ trans _ err signal is 1, reading out the data with errors from the NAND FLASH controller, sending a handshake signal with the NAND FLASH controller, and then discarding the data inside the NVMe controller; and when the CPU receives the information, calling the RAID module to recover the data. The method and the device can mark and process the transmission error, avoid the blockage or stop of the reading operation process due to the error, and reduce the influence of the error on data transmission.)

1. An NVMe solid state disk exception handling method is characterized by comprising the following steps:

when the NAND FLASH controller detects that an error occurs in direct transmission, the NAND FLASH controller sets a transmission interface signal data _ trans _ err signal of the NVMe controller to be 1, and reports the error information to the CPU;

when the NVMe controller detects that the data _ trans _ err signal is 1, reading error data from the NAND FLASH controller through the NVMe controller, sending a handshake signal with the NAND FLASH controller, and then discarding the data inside the NVMe controller;

and when the CPU receives the information, calling a RAID module through the CPU to recover the data.

2. The NVMe solid state disk exception handling method of claim 1, further comprising:

when the RAID module recovers the data and fails to recover, the following operations are carried out:

configuring a DMA command through the CPU, and setting a data _ cmd _ err field segment in the DMA command to be 1;

when the NVMe controller receives the DMA command, the data corresponding to the DMA command is moved through the NVMe controller so as to be written into a host memory, and a data _ cmd _ err domain segment in a first execution result corresponding to the DMA command is set to be 1 and then reported to the CPU;

and when the NVMe controller completes data transfer of all DMA commands corresponding to the NVMe command, setting a data _ cmd _ err field segment in a second execution result corresponding to the NVMe command to be 1 by the NVMe controller and then reporting to the CPU.

3. The NVMe solid state disk exception handling method of claim 2, further comprising:

when the CPU receives the second execution result, the CPU issues specific data for completing a queue to a CQ corresponding to the NVMe command, and the error state of the queue is set to be 1;

and writing the specific data into the host memory through the NVMe controller, and sending an MSI-X interrupt message, an MSI interrupt message or a PIN-Base interrupt message to the host according to the configuration of PCIE.

4. The NVMe solid state disk exception handling method of claim 1, further comprising:

when the RAID module recovers the data successfully, the following operations are carried out:

writing the successfully recovered data into an on-chip cache or a DDR memory of the SSD controller through the CPU, and correspondingly configuring a DMA command;

when the NVMe controller receives the DMA command, the data corresponding to the DMA command is moved through the NVMe controller so as to be written into a host memory, and a first execution result corresponding to the DMA command is reported to the CPU;

and when the NVMe controller completes data transfer of all the DMA commands corresponding to the NVMe commands, reporting a second execution result corresponding to the NVMe commands to the CPU through the NVMe controller.

5. The NVMe solid state disk exception handling method of claim 4, further comprising:

and replying the CQ data with successful execution to the host through the NVMe controller.

6. The NVMe solid state disk exception handling method of claim 4, further comprising:

when the CPU receives the second execution result, the CPU issues specific data of a completion queue to a CQ corresponding to the NVMe command;

and writing the specific data into the host memory through the NVMe controller, and sending an MSI-X interrupt message, an MSI interrupt message or a PIN-Base interrupt message to the host according to the configuration of PCIE.

7. An NVMe solid state disk exception handling device is characterized by comprising an NAND FLASH controller, an NVMe controller and a CPU, wherein:

when the NAND FLASH controller detects that an error occurs in direct transmission, setting a data _ trans _ err signal of a transmission interface with the NVMe controller to be 1, and reporting the error information to the CPU;

when the NVMe controller detects that the data _ trans _ err signal is 1, reading out error data from the NAND FLASH controller, sending a handshake signal with the NAND FLASH controller, and then discarding the data inside the NVMe controller;

and when the CPU receives the information, calling a RAID module to recover the data.

8. The NVMe solid state disk exception handling device of claim 7,

when the RAID module recovers the data and fails to recover, the CPU configures a DMA command and sets a data _ cmd _ err field in the DMA command to be 1;

when the NVMe controller receives the DMA command, the data corresponding to the DMA command is moved to write the data into a host memory, and a data _ cmd _ err field in a first execution result corresponding to the DMA command is set to be 1 and then reported to the CPU;

and when the NVMe controller completes data transfer of all DMA commands corresponding to the NVMe command, setting a data _ cmd _ err domain segment in a second execution result corresponding to the NVMe command to be 1 and reporting to the CPU.

9. The NVMe solid state disk exception handling device of claim 8,

when the CPU receives the second execution result, issuing specific data for completing a queue to a CQ corresponding to the NVMe command, and setting the error state of the queue to be 1;

and the NVMe controller writes the specific data into the host memory, and sends MSI-X interrupt message, MSI interrupt message or PIN-Base interrupt message to the host according to the configuration of PCIE.

10. An integrated chip, comprising:

the NVMe solid state disk exception handling device of any one of claims 7-9.

Technical Field

The invention relates to the field of solid state disk operation, in particular to an NVMe solid state disk exception handling method, device and integrated chip.

Background

Currently, NVMe (Non-Volatile Memory express) Solid State Disk (SSD) gains more and more attention for storage due to its advantages of low latency, low power consumption, high bandwidth, and the like, and also becomes a new trend for storage device development. Objects of the NVMe controller on the Data path mainly include PCIe (Peripheral Component Interconnect express), NVMe controller, on-chip cache or DDR (Double Data Rate, Double synchronous dynamic random access memory), NAND Flash controller, LDPC (Low Density Parity Check, sparse Data Check technology) codec, RAID (Redundant Arrays of Independent Disk array technology), CPU (Central Processing Unit), and the like. In order to improve the bandwidth of read data, a latest read method is to directly transmit the fragmented data read by NAND particles in the NAND Flash controller to the NVMe controller through a hardware circuit, so that the waiting time on the cache or DDR of the SSD controller is avoided, and the time for configuring NVMe DMA (Direct Memory Access) by the CPU is saved through a hardware circuit Direct transmission mode, thereby improving the bandwidth of the read operation data of the solid state disk.

However, in such a process, once a data reading error occurs, the reading operation of the solid state disk is blocked and cannot be continued, and how to solve the reading error and the abnormal recovery of the data is a problem to be solved by those skilled in the art.

Disclosure of Invention

In view of this, the present invention provides an NVMe solid state disk exception handling method, an NVMe solid state disk exception handling device, and an integrated chip. The specific scheme is as follows:

an NVMe solid state disk exception handling method comprises the following steps:

when the NAND FLASH controller detects that an error occurs in direct transmission, the NAND FLASH controller sets a transmission interface signal data _ trans _ err signal of the NVMe controller to be 1, and reports the error information to the CPU;

when the NVMe controller detects that the data _ trans _ err signal is 1, reading error data from the NAND FLASH controller through the NVMe controller, sending a handshake signal with the NAND FLASH controller, and then discarding the data inside the NVMe controller;

and when the CPU receives the information, calling a RAID module through the CPU to recover the data.

Preferably, the NVMe solid state disk exception handling method further includes:

when the RAID module recovers the data and fails to recover, the following operations are carried out:

configuring a DMA command through the CPU, and setting a data _ cmd _ err field segment in the DMA command to be 1;

when the NVMe controller receives the DMA command, the data corresponding to the DMA command is moved through the NVMe controller so as to be written into a host memory, and a data _ cmd _ err domain segment in a first execution result corresponding to the DMA command is set to be 1 and then reported to the CPU;

and when the NVMe controller completes data transfer of all DMA commands corresponding to the NVMe command, setting a data _ cmd _ err field segment in a second execution result corresponding to the NVMe command to be 1 by the NVMe controller and then reporting to the CPU.

Preferably, the NVMe solid state disk exception handling method further includes:

when the CPU receives the second execution result, the CPU issues specific data for completing a queue to a CQ corresponding to the NVMe command, and the error state of the queue is set to be 1;

and writing the specific data into the host memory through the NVMe controller, and sending an MSI-X interrupt message, an MSI interrupt message or a PIN-Base interrupt message to the host according to the configuration of PCIE.

Preferably, the NVMe solid state disk exception handling method further includes:

when the RAID module recovers the data successfully, the following operations are carried out:

writing the successfully recovered data into an on-chip cache or a DDR memory of the SSD controller through the CPU, and correspondingly configuring a DMA command;

when the NVMe controller receives the DMA command, the data corresponding to the DMA command is moved through the NVMe controller so as to be written into a host memory, and a first execution result corresponding to the DMA command is reported to the CPU;

and when the NVMe controller completes data transfer of all the DMA commands corresponding to the NVMe commands, reporting a second execution result corresponding to the NVMe commands to the CPU through the NVMe controller.

Preferably, the NVMe solid state disk exception handling method further includes:

and replying the CQ data with successful execution to the host through the NVMe controller.

Preferably, the NVMe solid state disk exception handling method further includes:

when the CPU receives the second execution result, the CPU issues specific data of a completion queue to a CQ corresponding to the NVMe command;

and writing the specific data into the host memory through the NVMe controller, and sending an MSI-X interrupt message, an MSI interrupt message or a PIN-Base interrupt message to the host according to the configuration of PCIE.

Correspondingly, this application has still disclosed an NVMe solid state hard drives exception handling device, including NAND FLASH controller, NVMe controller and CPU, wherein:

when the NAND FLASH controller detects that an error occurs in direct transmission, setting a data _ trans _ err signal of a transmission interface with the NVMe controller to be 1, and reporting the error information to the CPU;

when the NVMe controller detects that the data _ trans _ err signal is 1, reading out error data from the NAND FLASH controller, sending a handshake signal with the NAND FLASH controller, and then discarding the data inside the NVMe controller;

and when the CPU receives the information, calling a RAID module to recover the data.

Preferably, when the RAID module recovers the data and the recovery fails, the CPU configures a DMA command, and sets a data _ cmd _ err field segment in the DMA command to 1;

when the NVMe controller receives the DMA command, the data corresponding to the DMA command is moved to write the data into a host memory, and a data _ cmd _ err field in a first execution result corresponding to the DMA command is set to be 1 and then reported to the CPU;

and when the NVMe controller completes data transfer of all DMA commands corresponding to the NVMe command, setting a data _ cmd _ err domain segment in a second execution result corresponding to the NVMe command to be 1 and reporting to the CPU.

Preferably, when the CPU receives the second execution result, the CPU issues specific data for completing a queue to a CQ corresponding to the NVMe command, and sets an error state of the queue to 1;

and the NVMe controller writes the specific data into the host memory, and sends MSI-X interrupt message, MSI interrupt message or PIN-Base interrupt message to the host according to the configuration of PCIE.

Correspondingly, this application still discloses an integrated chip, includes:

NVMe solid state disk exception handling device as any one of the above.

The application discloses an NVMe solid state disk exception handling method, which comprises the following steps: when the NAND FLASH controller detects that an error occurs in direct transmission, the NAND FLASH controller sets a transmission interface signal data _ trans _ err signal of the NVMe controller to be 1, and reports the error information to the CPU; when the NVMe controller detects that the data _ trans _ err signal is 1, reading error data from the NAND FLASH controller through the NVMe controller, sending a handshake signal with the NAND FLASH controller, and then discarding the data inside the NVMe controller; and when the CPU receives the information, calling a RAID module through the CPU to recover the data. According to the data transmission method and device, exception processing is carried out on the condition that errors occur in direct transmission, setting and information reporting of the data _ trans _ err signal are achieved through the NAND FLASH controller, the NVMe controller conducts handshaking under the condition that the data _ trans _ err signal is known to be set to be 1, normal transmission of subsequent information is guaranteed, meanwhile, the CPU calls the RAID module to conduct data recovery, the whole process marks and processes and recovers the condition of the transmission errors, the situation that the reading operation flow is blocked or stopped due to the errors is avoided, and the influence of the occurrence of the errors on data transmission is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart illustrating a method for handling an exception of an NVMe solid state disk according to an embodiment of the present invention;

fig. 2 is a flow chart of substeps of an NVMe solid state disk exception handling method according to an embodiment of the present invention;

fig. 3 is a flow chart illustrating sub-steps of an NVMe solid state disk exception handling method according to an embodiment of the present invention;

fig. 4 is a structural distribution diagram of an NVMe solid state disk exception handling system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Once a data reading error occurs in the existing latest reading method, the reading operation of the solid state disk is blocked and cannot be continued, and how to solve the problems of the reading error and the abnormal recovery of the data needs to be solved by a technical person in the field. The whole process marks and processes and recovers the transmission error condition, the read operation process is prevented from being blocked or stopped due to the error, and the influence of the error on data transmission is reduced.

The embodiment of the invention discloses an NVMe solid state disk exception handling method, which is shown in figure 1 and comprises the following steps:

s1: when detecting that an error occurs in direct transmission, the NAND FLASH controller sets a transmission interface signal data _ trans _ err signal of the NVMe controller to 1 through the NAND FLASH controller, and reports the error information to the CPU;

s2: the NVMe controller detects that the data _ trans _ err signal is 1, reads out error data from the NAND FLASH controller through the NVMe controller, sends a handshake signal with the NAND FLASH controller, and then discards the data inside the NVMe controller;

it can be understood that, when detecting that the data _ trans _ err signal is 1, the NVMe controller reads data and sends a handshake signal to avoid the whole data reading process from being stuck, and on the other hand, discards the data to avoid the garbage data from being transmitted, where the discarding action includes: and keeping the transmission length of the transmitted data before the junk data unchanged, and not initiating the NVMe DMA transmission.

S3: and when the CPU receives the information, the CPU calls the RAID module to recover the data.

Further, two cases are divided according to whether data recovery is successful, as shown in fig. 2, when a RAID module recovers data and recovery fails, the following operations are performed:

s41: configuring a DMA command through a CPU, and setting a data _ cmd _ err field segment in the DMA command to be 1;

s42: when the NVMe controller receives a DMA command, the data corresponding to the DMA command is moved through the NVMe controller so as to be written into a host memory, and a data _ cmd _ err field segment in a first execution result corresponding to the DMA command is set to be 1 and then reported to a CPU;

s43: and when the NVMe controller completes data transfer of all DMA commands corresponding to the NVMe command, setting the data _ cmd _ err field segment in the second execution result corresponding to the NVMe command to be 1 by the NVMe controller and then reporting to the CPU.

It is understood that the data _ cmd _ err fields of the first and second execution results in S42 and S43 are set to 1, indicating that the DMA command issued by the CPU and containing an error has been executed.

Further, after the second execution result is reported to the CPU, the method for handling the exception of the NVMe solid state disk further includes:

s44: when the CPU receives the second execution result, the CPU issues specific data of the Completion Queue to a CQ (Completion Queue) corresponding to the NVMe command, and sets the error state of the Queue to 1;

s45: and writing specific data into a host memory through the NVMe controller, and sending an MSI-X interrupt message, an MSI interrupt message or a PIN-Base interrupt message to the host according to the configuration of the PCIE.

It will be appreciated that the CPU sets the queue error status to 1 to inform the host of the execution result of the NVMe command and the specific errors that occurred.

Similarly, referring to fig. 3, when the RAID module recovers the data and the recovery succeeds, the following operations are performed:

s51: writing the successfully recovered data into an on-chip cache or a DDR memory of the SSD controller through the CPU, and correspondingly configuring a DMA command;

s52: when the NVMe controller receives a DMA command, the data corresponding to the DMA command is moved through the NVMe controller so as to be written into a host memory, and a first execution result corresponding to the DMA command is reported to the CPU;

s53: and when the NVMe controller completes the data transfer of all DMA commands corresponding to the NVMe command, reporting a second execution result corresponding to the NVMe command to the CPU through the NVMe controller.

Further, after reporting the second execution result, performing CQ data processing, where the processing may be implemented by software or hardware, and the process of hardware implementation is as follows: replying the CQ data successfully executed to the host through the NVMe controller; the software implementation process comprises the following steps: when the CPU receives the second execution result, the CPU issues specific data of the completion queue to a CQ corresponding to the NVMe command; and writing specific data into a host memory through the NVMe controller, and sending an MSI-X interrupt message, an MSI interrupt message or a PIN-Base interrupt message to the host according to the configuration of the PCIE.

It can be understood that the exception handling method of this embodiment adopts a software and hardware cooperation mode, where operations executed by the NAND FLASH controller and the NVMe controller both belong to hardware operations, operations of the CPU belong to software operations, and operations of the CPU calling the RAID module belong to software and hardware cooperation. By means of software and hardware cooperation technology, the solid state disk exception handling scheme under different exception scenes is achieved, error handling can be effectively and quickly completed, and serious problems that the solid state disk is stuck and the like are avoided.

The application discloses an NVMe solid state disk exception handling method, which comprises the following steps: when detecting that an error occurs in direct transmission, the NAND FLASH controller sets a transmission interface signal data _ trans _ err signal of the NVMe controller to 1 through the NAND FLASH controller, and reports the error information to the CPU; when the NVMe controller detects that the data _ trans _ err signal is 1, reading error data from the NAND FLASH controller through the NVMe controller, sending a handshake signal with the NAND FLASH controller, and then discarding the data inside the NVMe controller; and when the CPU receives the information, the CPU calls the RAID module to recover the data. According to the data transmission method and device, exception processing is carried out on the condition that errors occur in direct transmission, setting and information reporting of the data _ trans _ err signal are achieved through the NAND FLASH controller, the NVMe controller conducts handshaking under the condition that the data _ trans _ err signal is known to be set to be 1, normal transmission of subsequent information is guaranteed, meanwhile, the CPU calls the RAID module to conduct data recovery, the whole process marks and processes and recovers the condition of the transmission errors, the situation that the reading operation flow is blocked or stopped due to the errors is avoided, and the influence of the occurrence of the errors on data transmission is reduced.

Correspondingly, the embodiment of the application further discloses an NVMe solid state disk exception handling device, as shown in fig. 4, including NAND FLASH controller 1, NVMe controller 2, and CPU 3, wherein:

when the NAND FLASH controller 1 detects that an error occurs in direct transmission, the data _ trans _ err signal of the transmission interface signal with the NVMe controller 2 is set to 1, and the error information is reported to the CPU 3;

when the NVMe controller 2 detects that the data _ trans _ err signal is 1, reading out the data with an error from the NAND FLASH controller 1, sending a handshake signal with the NAND FLASH controller 1, and then discarding the data inside the NVMe controller 2;

and when the CPU 3 receives the information, calling the RAID module to recover the data.

The method and the device have the advantages that exception processing is carried out on the condition that errors occur in direct transmission, setting and information reporting of the data _ trans _ err signal are achieved through the NAND FLASH controller 1, the NVMe controller 2 conducts handshaking under the condition that the data _ trans _ err signal is known to be set to be 1, normal transmission of subsequent information is guaranteed, meanwhile, the CPU 3 calls the RAID module to conduct data recovery, the whole process marks and processes the condition of transmission errors, the situation that the reading operation flow is blocked or stopped due to the errors is avoided, and the influence of the errors on data transmission is reduced.

In some specific embodiments, when the RAID module recovers data and the recovery fails, the CPU 3 configures a DMA command, and sets a data _ cmd _ err field segment in the DMA command to 1;

when receiving the DMA command, the NVMe controller 2 moves data corresponding to the DMA command to write the data into a host memory, sets a data _ cmd _ err field segment in a first execution result corresponding to the DMA command to be 1, and reports the data to the CPU 3;

and when the NVMe controller 2 completes data transfer of all DMA commands corresponding to the NVMe command, setting a data _ cmd _ err field segment in a second execution result corresponding to the NVMe command to be 1 and then reporting to the CPU 3.

In some specific embodiments, when the CPU 3 receives the second execution result, it issues specific data of the completion queue to the CQ corresponding to the NVMe command, and sets the error state of the queue to 1;

and the NVMe controller 2 writes specific data into the memory of the host, and sends MSI-X interrupt message, MSI interrupt message or PIN-Base interrupt message to the host according to the configuration of PCIE.

In some specific embodiments, when the RAID module recovers the data and the recovery is successful, the CPU 3 writes the data that is successfully recovered into the on-chip cache or the DDR memory of the SSD controller, and configures the DMA command accordingly;

when the NVMe controller 2 receives the DMA command, the data corresponding to the DMA command is moved through the NVMe controller 2 so as to be written into a host memory, and a first execution result corresponding to the DMA command is reported to the CPU 3;

and when the NVMe controller 2 completes data transfer of all DMA commands corresponding to the NVMe command, reporting a second execution result corresponding to the NVMe command to the CPU 3.

In some specific embodiments, the NVMe controller 2 is further configured to: and replying the CQ data with successful execution to the host.

In some specific embodiments, when the CPU 3 receives the second execution result, the specific data of the completion queue is issued to the CQ corresponding to the NVMe command; and the NVMe controller 2 writes specific data into the memory of the host, and sends MSI-X interrupt message, MSI interrupt message or PIN-Base interrupt message to the host according to the configuration of PCIE.

Correspondingly, the embodiment of the present application further discloses an integrated chip, including:

the NVMe solid state disk exception handling device according to any one of the above embodiments.

The specific relevant content of the NVMe solid state disk exception handling apparatus may refer to the description in the above embodiments, and is not described herein again.

The summary integrated chip in this embodiment has the same technical effect as the NVMe solid state disk exception handling device in the above embodiment, and is not described here again.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The NVMe solid state disk exception handling method, the NVMe solid state disk exception handling device and the integrated chip provided by the invention are described in detail, a specific example is applied in the description to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种批量测试USB电子盘的方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!