Fault eMMC positioning method in RAIM (random access memory) framework SSD

文档序号：1818148 发布日期：2021-11-09 浏览：33次中文

阅读说明：本技术 一种RAIM构架SSD中故障eMMC定位方法 (Fault eMMC positioning method in RAIM (random access memory) framework SSD ) 是由樊凌雁李俊凡于 2021-07-09 设计创作，主要内容包括：本发明公开了一种RAIM构架SSD中故障eMMC定位方法,包括S1,测试若干个SSD盘,选出故障盘,并记录故障RCA；S2,对故障盘中每一个eMMC进行编号,并格式化故障盘；S3,测试正常盘不同工作状态下的色温,作为故障检测对比标准；S4,选中S1中非故障的RCA,写入数据,获取红外成像图；S5,观测红外成像图,得到正常工作的eMMC对应的S4中RCA；S6,重复S4和S5,测试正常SSD盘上所有非故障的RCA对应的eMMC是否是正常颗粒。本发明利用红外技术定位故障颗粒,通过对比正常工作的eMMC和故障eMMC对外部温度环境的影响,利用红外成像图直接判断。(The invention discloses a method for positioning failure eMMC in an RAIM framework SSD, which comprises S1, testing a plurality of SSD disks, selecting failure disks, and recording failure RCA; s2, numbering each eMMC in the fault disc, and formatting the fault disc; s3, testing the color temperature of the normal disc in different working states as a fault detection contrast standard; s4, selecting the non-fault RCA in S1, writing data, and acquiring an infrared imaging graph; s5, observing the infrared imaging image to obtain RCA in S4 corresponding to the eMMC in normal work; s6, repeating S4 and S5, testing whether eMMC corresponding to all non-failed RCAs on normal SSD disk is normal particle. The invention utilizes the infrared technology to position the fault particles, and utilizes the infrared imaging graph to directly judge by comparing the influence of the normally working eMMC and the fault eMMC on the external temperature environment.)

1. A method for locating a fault eMMC in an SSD of a RAIM framework is characterized by comprising the steps of shooting a thermal imaging image of the eMMC through a thermal infrared imager and judging whether the eMMC has a fault according to a preset color range.

2. The method for locating the failed eMMC in the RAIM fabric SSD according to claim 1, comprising:

s1, testing a plurality of SSD disks, selecting a fault disk, and recording a fault RCA;

s2, numbering each eMMC in the fault disc, and formatting the fault disc;

s3, testing the color temperature of the normal disc in different working states as a fault detection contrast standard;

S4, selecting the non-fault RCA in S1, writing data, and acquiring an infrared imaging graph;

s5, observing the infrared imaging image to obtain RCA in S4 corresponding to the eMMC in normal work;

s6, repeating S4 and S5, and testing whether eMMC corresponding to all non-failed RCAs on the failed SSD disk is normal particle.

3. The method for locating the failed eMMC in the RAIM framework SSD as claimed in claim 2, wherein the testing of the SSD disks selects the failed disk and records the failed RCA, specifically, the erasing, writing and reading commands are sequentially sent to the eMMC of all RCA on the disks, whether the erasure is failed or not is judged by checking the 0bit data on the data feedback line after the erasure, if the 0bit is 0, the erasure is successful, and if the 0bit or 1 is exceeded the preset erasure time, the failure of the eMMC erasure corresponding to the RCA is indicated; sending a write-read command, writing a specific preset value, reading again for comparison, and if the preset value is inconsistent, determining that the write-read is in failure; and when any fault occurs in the erasing and reading, namely the eMMC corresponding to the RCA is considered to be a fault, otherwise, the state of the eMMC corresponding to all the RCA of the disc is recorded, namely normal or fault.

4. The method of claim 3, wherein each eMMC in the failed disk is numbered and formatted, wherein the failed disk is formatted to clear data to avoid chip heating caused by garbage collection of other free eMMC when one eMMC in the selected array is operated, thereby causing erroneous determination.

5. The method according to claim 4, wherein the color temperature of the tested normal disk in different working states is used as a fault detection comparison standard, specifically, the infrared imager is used for obtaining the temperature and the color temperature for several times, and the temperature of the region where the eMMC is located in the idle and non-working state is recorded as the low temperature color corresponding to the color range in the infrared imaging chart; and recording the temperature of the region where the eMMC is located in the working state of writing data for 5 minutes, corresponding to the color range in the infrared imaging graph, and recording as high-temperature color.

6. The method for locating the failed eMMC in the SSD of the RAIM framework of claim 5, wherein the non-failed RCA in S1 is selected, data is written to obtain an infrared imaging graph, specifically, one RCA without failure in S1 is selected, data is written to the selected RCA for 5 minutes, the eMMC corresponding to the RCA works, the temperature rises to reach a stable state, and then an infrared thermometer is used for shooting the infrared imaging graph.

7. The method as claimed in claim 6, wherein the infrared imaging map is observed to obtain the RCA in S4 corresponding to the eMMC that is working normally, specifically, the imaging map is observed, and if a color temperature range of a region where one eMMC is located matches the color range of the high temperature in S3, it is indicated that the eMMC is working normally, and the particle is a normal particle and corresponds to the RCA issued by S4.

8. The method as claimed in claim 7, wherein the steps of repeating S4 and S5 are repeated to test whether the eMMC corresponding to all non-failed RCAs on the disc of the normal SSD is a normal particle, specifically repeating S4 and S5, and sequentially testing whether the eMMC corresponding to all normal RCAs on the disc is a normal particle, and all normal particles on the disc are detected, and the rest are the failed particles.

Technical Field

The invention belongs to the technical field of data storage, and relates to a method for positioning a fault eMMC in an SSD of an RAIM framework.

Background

eMMC (Embedded Multi Media Card), one of the mainstream storage devices at present, is. Compared with a single NAND Flash particle, the eMMC integrates a high-density NAND Flash memory and an MCU (MMC controller) into one BGA chip, completes the management of the NAND Flash in the chip and provides a standard interface for a user. The eMMC protocol is an embedded memory standard established by the MMC association and provides high capacity, high stability, and high read and write speeds.

In a single eMMC system, a Controller and the eMMC interact, and five operation modes are mainly adopted: boot Mode, Device Identification Mode, Interrupt Mode, Data Transfer Mode, and Inactive Mode. Under the Device Identification Mode, there are mainly 3 phases Idle State, Ready State, and Identification State. In the device identification mode, three commands are issued to the eMMC:

CMD 1: a command 1 for detecting whether the eMMC completes initialization;

CMD 2: a command 2, configured to acquire a CID (Device identification number, Device information number, and a number of the eMMC that records factory information and is carried by the eMMC itself when the eMMC leaves the factory);

CMD 3: a command 3 for setting RCA (Relative device Address, each eMMC has a special RCA register for storing RCA);

under Idle State, the eMMC is initialized internally, a Controller continuously sends CMD1 to inquire whether the eMMC finishes initialization or not, the eMMC enters Ready State after the initialization is finished, then the Controller sends CMD2 to obtain the CID of the eMMC, the eMMC returns the content of the CID of 127Bits to the Controller through a CMD line (Command line, Command line of the eMMC) after receiving CMD2, and the eMMC enters the Identification State after sending the CID. In this state, the Controller allocates 16Bits of RCA to the eMMC, sends a CMD3 command, sets the RCA in the RCA register in the eMMC, and after the RCA is set, the eMMC completes the development Identification, enters the Data Transfer Mode, and can be used for normal Data transmission.

With the increase of stored data, a single eMMC cannot meet the storage requirement, and a large-capacity SSD (Solid State Disk) is urgently needed, while a SSD based on Flash is too complicated, and in order to make the design of the SSD simple, an SSD based on a raim (redundant Array of Independent module) framework of an eMMC Array is produced, see fig. 1.

The architecture is composed of an SSD Controller and a plurality of channel channels, wherein each channel is provided with a plurality of eMMC, and a plurality of CMD lines shared by the eMMC on one channel. Referring to fig. 2, channel0 of fig. 1 is shown, i.e., an eMMC cascade structure on channel 0.

When the SSD of the framework is initialized, an SSD Controller sends a broadcast command CMD2 to request CIDs from all devices on a Channel0, 3 eMCs which are in Ready State on the Channel0 start to continuously send CID numbers of 128 bits of the eMCs to a CMD line, each bit period, the eMCs output CIDs in a leakage mode, 0 is output in a leakage mode, a high impedance State is output in a high level, bit values on the line are the result of bit value AND corresponding to all CIDs, the eMCs monitor output bit streams bit by bit, if the CIDs sent by the eMMC devices are not matched with the CIDs on the CMD line in any bit period, sending is stopped, and the next request identification period is waited. The number of each CID is unique, only one device can successfully send the complete CID to the Controller in one identification period, and then the Controller sends CMD3 to assign the relative device address RCA to the device. After RCA is set, eMMC can not feed back the identification period, and meanwhile, open-drain output is converted into push-pull output. Therefore, the CID sequence received by the Controller must be from small to large, but it is not clear which one eMMC returns the CID, nor is it known which eMMC corresponds to the RCA being reallocated. Then, when the SSD fails, the SSD Controller can clearly know which eMMC of the RCA fails, and cannot know the specific spatial location of the failed eMMC. In another scheme, before production, the CID of all the emmcs are read, and the emmcs are arranged according to the size before going to a production line, so that the RCAs in the produced product are placed at corresponding positions according to a pre-assumed rule. The RCA can correspond to the eMMC on the actual layout one by one, but the scheme is very complex in production implementation, extremely low in efficiency and not adopted in general production. The eMMC array fault location is currently very difficult.

The existing methods for detecting particle faults in an eMMC array have two types: one is direct detection and the other is detection by dedicated optical inspection equipment.

The first scheme is as follows: the scheme needs to monitor the return states of different operations when the eMMC chip is sequentially initialized, erased, written and read, analyze the return states when different commands are sent to judge what faults occur to the eMMC specifically, and then output the CID number or RCA of the eMMC.

The second scheme is as follows: electron beam fault detection is utilized. Under electron beam defect equipment, the brightness degrees of different devices in a chip are obviously different, and whether a fault occurs can be determined by comparing chip wiring.

For the first scheme, only the CID number or RCA number of the failed eMMC can be output, the specific spatial position of the failed eMMC cannot be determined, and the repair cannot be performed.

For the second scheme, professional equipment needs to be purchased, the price is high, the operation is complex, professional operation is needed, the detection efficiency is low, and the second scheme is not suitable for positioning the failure eMMC.

Disclosure of Invention

The invention aims to solve the problem that the fault eMMC in the eMMC array of the RAIM framework SSD is difficult to locate, an infrared imaging heat map of the work of the eMMC array is measured by using an infrared imaging technology, and the positions of normal and fault eMMC particles are rapidly distinguished through heat distribution.

The method comprises the steps of shooting a thermal imaging graph of the eMMC through a thermal infrared imager, and judging whether the eMMC has a fault according to a preset color range.

Preferably, the method specifically comprises the following steps:

s1, testing a plurality of SSD disks, selecting a fault disk, and recording a fault RCA;

s2, numbering each eMMC in the fault disc, and formatting the fault disc;

s3, testing the color temperature of the normal disc in different working states as a fault detection contrast standard;

s4, selecting the non-fault RCA in S1, writing data, and acquiring an infrared imaging graph;

s5, observing the infrared imaging image to obtain RCA in S4 corresponding to the eMMC in normal work;

s6, repeating S4 and S5, testing whether eMMC corresponding to all non-failed RCAs on normal SSD disk is normal particle.

Preferably, the testing of the plurality of SSD disks, the selection of the failed disk, and the recording of the failed RCA, specifically, the erasing, writing and reading commands are sequentially sent to the eMMC corresponding to all the RCA on the disk, whether the erasing is failed or not is judged by checking the 0bit data on the data feedback line after the erasing, if the 0bit is 0, the erasing is successful, and if the 0bit or 1 is beyond the preset erasing time, the failure of the eMMC erasing corresponding to the RCA is indicated; sending a write-read command, writing a specific preset value, reading again for comparison, and if the preset value is inconsistent, determining that the write-read is in failure; and when any one of the erasing and reading faults occurs, namely the eMMC corresponding to the RCA is considered to be fault, otherwise, the state of all the RCAs of the disc is recorded, namely normal or fault.

Preferably, each eMMC in the failed disk is numbered and the failed disk is formatted, wherein the failed disk is formatted and data is cleared, so that the situation that when one eMMC in the array is selected for operation, garbage collection occurs to other idle emmcs to cause the chip to generate heat, and misjudgment occurs.

Preferably, the color temperature of the normal disc in different working states is tested to serve as a fault detection comparison standard, specifically, the temperature and the color temperature are obtained by using an infrared imager for a plurality of times, and the temperature of the region where the eMMC is located in the idle non-working state is recorded to correspond to the color range in the infrared imaging graph and is recorded as the low-temperature color; and recording the temperature of the region where the eMMC is located in the working state of writing data for 5 minutes, corresponding to the color range in the infrared imaging graph, and recording as high-temperature color.

Preferably, the non-faulty RCA in S1 is selected, data is written, an infrared imaging graph is obtained, specifically, the non-faulty RCA in S1 is selected, and the data is written for 5 minutes. And (3) enabling the eMMC corresponding to the RCA register to work, raising the temperature to reach a stable state, and then shooting an infrared imaging picture by using an infrared thermometer.

Preferably, the infrared imaging graph is observed to obtain the RCA in S4 corresponding to the eMMC that normally works, specifically, the infrared imaging graph is observed, and if the color temperature range of an area where one eMMC is located conforms to the high-temperature color range in S3, it is indicated that the eMMC normally works, and the particle is a normal particle and corresponds to the RCA issued in S4.

Preferably, the steps of S4 and S5 are repeated to test whether the eMMC corresponding to all the non-failed RCAs on the normal SSD disk has normal particles, specifically, the steps of S4 and S5 are repeated to sequentially test whether the eMMC corresponding to all the normal RCAs on the disk is normal particles, all the normal particles on the disk are detected, and the rest are the failed particles.

The beneficial effects of the invention at least comprise:

the fault particles are positioned by using an infrared technology, and the influence of the normally working eMMC and the fault eMMC on the external temperature environment is compared, so that the infrared imaging graph is used for directly judging. The prior art can only judge the RCA number of the failure eMMC, but cannot determine the specific spatial position. The binding of the RCA address and the actual spatial position before production is complex, and the binding is intuitive, quick and simple without an infrared imaging picture.

The method can directly and quickly detect the fault particles in the working state of the eMMC array, does not need to replace detection firmware, does not need to bind RCA and eMMC addresses, does not need to adopt special chip detection equipment, reduces the production cost and reduces the detection complexity.

Drawings

Fig. 1 is a block diagram of a solid state disk eMMC array structure based on a RAIM architecture in the prior art;

fig. 2 is a schematic diagram of an eMMC cascade structure in channel0 in fig. 1;

Fig. 3 is a flowchart illustrating a method for locating a failure eMMC in an SSD of a RAIM framework according to an embodiment of the present invention;

fig. 4 is a schematic diagram of eMMC array infrared imaging of a method for locating a failure eMMC in an SSD of a RAIM framework according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.

The method comprises the steps of shooting a thermal imaging image of the eMMC through a thermal infrared imager, and judging whether the eMMC breaks down or not according to a preset color range.

Referring to fig. 3, the method comprises the following steps:

s1, testing a plurality of SSD disks, selecting a fault disk, and recording a fault RCA;

S2, numbering each eMMC in the fault disc, and formatting the fault disc;

s3, testing the color temperature of the normal disc in different working states as a fault detection contrast standard;

s4, selecting the non-fault RCA in S1, writing data, and acquiring an infrared imaging graph;

s5, observing the infrared imaging image to obtain RCA in S4 corresponding to the eMMC in normal work;

s6, repeating S4 and S5, and testing whether normal particles exist in the eMMC corresponding to all of the non-failed RCAs on the normal SSD disk.

In the specific implementation, S1, testing a plurality of SSD disks, selecting a fault disk, and recording a fault RCA, specifically, sequentially sending erasing, writing and reading commands to eMMC corresponding to all RCA on the disks, judging whether the erasing is faulty or not by checking 0-bit data on a data feedback line after the erasing, if the 0-bit is 0, indicating that the erasing is successful, and if the 0-bit is more than the preset erasing time, indicating that the RCA erasing is faulty or not, if the 0-bit is more than the preset erasing time, the 0-bit or 1; sending a write-read command, writing a specific preset value, reading again for comparison, and if the preset value is inconsistent, determining that the write-read is in failure; and when any one of the erasing and reading faults occurs, namely the eMMC corresponding to the RCA is considered to be fault, otherwise, the state of all the RCAs of the disc is recorded, namely normal or fault.

And S2, numbering each eMMC in the fault disc, formatting the fault disc, clearing data, and avoiding the situation that when one eMMC in the array is selected for operation, other idle eMMC generate garbage collection to heat the chip, so that misjudgment is generated.

S3, testing color temperatures of the normal disc in different working states as a fault detection comparison standard, specifically, obtaining the temperature and the color temperature by using an infrared imager for a plurality of times, recording the color range of the region where the eMMC is located in the idle non-working state corresponding to the infrared imaging image, and recording the color range as low-temperature color; and recording the temperature of the region where the eMMC is located in the working state of writing data for 5 minutes, corresponding to the color range in the infrared imaging graph, and recording as high-temperature color.

And S4, selecting the non-fault RCA in S1, writing data, acquiring an infrared imaging graph, specifically selecting one RCA without fault in S1, and writing the data for 5 minutes. And (3) enabling the eMMC corresponding to the RCA to work, raising the temperature to reach a stable state, and then shooting an infrared imaging picture by using an infrared thermometer.

And S5, observing the infrared imaging graph to obtain the RCA in S4 corresponding to the eMMC which normally works, specifically observing the imaging graph, wherein if the color temperature range of an area where one eMMC is located accords with the high-temperature color range in S3, the eMMC is indicated to normally work, the particles are normal particles and correspond to the RCA issued by S4.

And S6, repeating S4 and S5, testing whether the eMMC corresponding to all the non-fault RCAs on the normal SSD disc is normal particles, specifically repeating S4 and S5, sequentially testing whether the eMMC corresponding to all the normal RCAs on the disc is normal particles, checking out all the normal particles on the disc, and obtaining the residual particles which are fault particles.

Referring to fig. 4, a corresponding infrared map of an eMMC array disk in a normal operating state, an infrared imaging technique is used to locate an eMMC particle fault. When current passes through a conductor or a semiconductor, a part of electric energy is always converted into heat energy. The inside of the chip mainly comprises transistors, binary 0 and 1 are realized in the transistors through current reversal, in the eMMC particles which normally work, the number of times of reversal of a considerable number of transistors in each second reaches hundreds of millions, each time is accompanied with the consumption of current and the generation of heat, and a high-temperature area is formed because the heat dissipation speed is far less than the production speed. The EMMC particles which fail to work due to the fault can not generate heat due to bit flipping, and the temperature is relatively low.

The infrared thermal imager converts invisible infrared energy emitted by an object into a visible thermal image, and utilizes an infrared detector and an optical imaging objective to receive an infrared radiation energy distribution pattern of a detected object and reflect the infrared radiation energy distribution pattern onto a photosensitive element of the infrared detector so as to generate a high-quality infrared thermal image, wherein the thermal image corresponds to a thermal distribution field on the surface of the object, and different colors on the thermal image represent different temperatures of the detected object.

The host selects RCA0 to send a read-write task, an internal Controller of the eMMC2 receives a command to start storing data, the internal part of the chip starts to operate and generate heat, the temperature corresponding to the color of the area where the eMMC2 is located can be seen to be higher than the temperature corresponding to the color of the eMMC in an idle state in an infrared thermometer, namely the particles on the actual space corresponding to the RCA0 are eMMC2, and the eMMC2 are normal particles.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

10页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种SSD固态硬盘测试方法

Fault eMMC positioning method in RAIM (random access memory) framework SSD

相关技术

网友询问留言