Automatic injection verification method and system for hardware fault of multi-control storage device

文档序号:1875312 发布日期:2021-11-23 浏览:13次 中文

阅读说明:本技术 一种多控存储设备硬件故障的自动化注入验证方法与系统 (Automatic injection verification method and system for hardware fault of multi-control storage device ) 是由 宋以强 于 2021-07-16 设计创作,主要内容包括:本发明提供了一种多控存储设备硬件故障的自动化注入验证方法与系统,本发明通过服务器端生成和删除硬件故障注入文件,通过存储设备读取硬件故障注入文件并修改page页中存放的机箱各项数据,来制造故障,从而通过通过修改硬件管理芯片返回的机箱状态信息模拟多种硬件故障的产生和消失,验证存储端软件逻辑的合理性和可靠性,在验证软件逻辑时不再依赖硬件,完成了故障注入的软硬件解耦,不必再对硬件做飞线或者其他物理上的处理,节约了时间、人力和物力。(The invention provides an automatic injection verification method and system for hardware faults of a multi-control storage device.)

1. An automated injection verification method for hardware faults of a multi-control storage device, the method comprising the following operations:

constructing a Json data structure and storing different hardware fault injection information;

the method comprises the steps that a server side establishes connection with storage equipment, the type of a case of the storage equipment is obtained, Json files for storing hardware fault injection information are read according to the type of the case, and fault injection is carried out one by one;

the storage equipment generates a fault injection file according to the fault injection information, performs fault injection according to the fault injection file, waits for alarm generation, and polls whether the storage equipment generates faults or not at the server side;

and after the fault is generated, deleting the fault injection file of the storage device and inquiring whether the alarm signal disappears, wherein the fault injection is successful after the fault disappears, and otherwise, the injection fails.

2. The method for automated injection verification of hardware faults of multi-control storage equipment according to claim 1, wherein the fault injection performed by the storage equipment according to the fault injection file specifically comprises:

and the storage equipment adds an interface in a case management module for storing software, reads the fault injection file through the interface, and modifies page data of a case management chip according to the read data for fault manufacture.

3. The method for automatically injecting and verifying the hardware fault of the multi-control storage device according to claim 2, wherein after obtaining a certain page, the interface reads a fault injection file already in the storage device, confirms whether the page is a page requiring fault injection, and if so, modifies a corresponding position of the original page according to the number of bytes to be modified, a modification value and an offset, and completes interface calling.

4. The method for automatic injection verification of hardware faults of multi-control storage equipment according to claim 1, wherein the Json file comprises a case type under an alarm event number and a serial number, a row number, a column number, a type of hardware to be injected with faults, the number of hardware to be injected with faults in one case, a case management protocol type, a page number to be modified, the number of bytes to be injected with data, a modified value and an offset of the value to be modified.

5. The method of claim 1, wherein a first row in the fault injection file represents a page, a second row represents the number of modified bytes, a third row represents a modified value, and a fourth row represents an offset of a value to be modified in the page.

6. The method according to claim 1, wherein when polling the storage device for the occurrence of a failure, the server compares the alarm event number of the alarm with the hardware ID of the failed storage device to determine whether the hardware ID of the failed storage device is the hardware ID of the injected failure.

7. The method for automated injection verification of hardware faults of multi-control storage devices according to claim 1, wherein the step of querying whether the alarm signal disappears at the server side is to query whether an auto _ fix field is YES, if YES, the fault disappears, and if not, the fault exists.

8. An automated injection verification system for hardware failures of a multi-control storage device, the system comprising:

the fault injection data construction module is used for constructing a Json data structure and storing different hardware fault injection information;

the connection establishing module is used for establishing connection with the storage equipment at the server side, acquiring the type of a case of the storage equipment, reading a Json file for storing hardware fault injection information according to the type of the case, and injecting the hardware fault injection information one by one;

the fault injection module is used for generating a fault injection file according to the fault injection information by the storage equipment, performing fault injection by the storage equipment according to the fault injection file, waiting for alarm generation, and polling whether the storage equipment has a fault or not by the server end;

and the fault elimination module is used for deleting the fault injection file of the storage device and inquiring whether the alarm signal disappears or not after the fault is generated, wherein the fault injection is successful after the fault injection file disappears, and otherwise, the injection fails.

9. The system of claim 8, wherein the Json file comprises a chassis type under an alarm event number and number, a serial number, a row number, a column number, a type of hardware to be injected with a fault, a number of hardware to be injected with a fault in a chassis, a chassis management protocol type, a page number to be modified, a number of bytes to be injected with data, a modified value, and an offset of the value to be modified.

10. The system of claim 8, wherein a first row in the fault injection file represents a page, a second row represents a number of modified bytes, a third row represents a modified value, and a fourth row represents an offset of a value to be modified in the page.

Technical Field

The invention relates to the technical field of server storage, in particular to an automatic injection verification method and system for hardware faults of multi-control storage equipment.

Background

A complete storage device is composed of software and hardware, wherein the hardware mainly comprises a PSU, a controller, a back plate, a hard disk, a cooling fan and the like, all parts generate faults in the using process, the storage software needs to respond in time when the faults occur, an alarm is reported, the user equipment is informed of the abnormity, and the user can maintain in time. Therefore, software needs to have a set of reliable logic, alarm can be timely and accurately reported when hardware fails, and the reported alarm can be cancelled when the hardware failure disappears. In order to ensure the logic reliability, frequent hardware fault manufacturing is needed to verify the logic and reliability of the code in the self-test process after the code is finished, but the hardware fault is difficult to manufacture, a hardware link needs to be modified by flying wires, and if part of hardware faults are manufactured forcibly, the hardware is also damaged, so that the research and development cost is increased, and the resources are wasted.

Disclosure of Invention

The invention aims to provide an automatic injection verification method and system for hardware faults of a multi-control storage device, and aims to solve the problems of high manufacturing cost and complicated hardware modification of hardware faults in the prior art, save time, manpower and material resources and avoid physical modification.

In order to achieve the technical purpose, the invention provides an automatic injection verification method for hardware faults of a multi-control storage device, which comprises the following operations:

constructing a Json data structure and storing different hardware fault injection information;

the method comprises the steps that a server side establishes connection with storage equipment, the type of a case of the storage equipment is obtained, Json files for storing hardware fault injection information are read according to the type of the case, and fault injection is carried out one by one;

the storage equipment generates a fault injection file according to the fault injection information, performs fault injection according to the fault injection file, waits for alarm generation, and polls whether the storage equipment generates faults or not at the server side;

and after the fault is generated, deleting the fault injection file of the storage device and inquiring whether the alarm signal disappears, wherein the fault injection is successful after the fault disappears, and otherwise, the injection fails.

Preferably, the performing, by the storage device, the fault injection according to the fault injection file specifically includes:

and the storage equipment adds an interface in a case management module for storing software, reads the fault injection file through the interface, and modifies page data of a case management chip according to the read data for fault manufacture.

Preferably, after obtaining a certain page, the interface reads the fault injection file already in the storage device, determines whether the page is a page that needs fault injection, and if so, modifies the corresponding position of the original page according to the number of bytes to be modified, the modification value and the offset, and completes interface calling.

Preferably, the Json file includes alarm event number and chassis type under the number, serial number, row number, column number, type of hardware to be injected with fault, number of hardware to be injected with fault in one chassis, chassis management protocol type, page number to be modified, number of bytes to be injected with data, modified value, and offset of value to be modified.

Preferably, the first line in the fault injection file represents a page, the second line represents the number of modified bytes, the third line represents the modified value, and the fourth line represents the offset of the value to be modified in the page.

Preferably, when the polling storage device generates a fault, the server needs to compare the alarm event number of the alarm and the hardware ID with the fault to determine whether the alarm event number and the hardware ID with the fault are the hardware ID with the fault.

Preferably, the querying whether the alarm signal disappears at the server side specifically queries whether the auto _ fix field is YES, and if the auto _ fix field is YES, it represents that the fault has disappeared, otherwise, the fault exists.

The invention also provides an automatic injection verification system for hardware faults of the multi-control storage device, which comprises the following steps:

the fault injection data construction module is used for constructing a Json data structure and storing different hardware fault injection information;

the connection establishing module is used for establishing connection with the storage equipment at the server side, acquiring the type of a case of the storage equipment, reading a Json file for storing hardware fault injection information according to the type of the case, and injecting the hardware fault injection information one by one;

the fault injection module is used for generating a fault injection file according to the fault injection information by the storage equipment, performing fault injection by the storage equipment according to the fault injection file, waiting for alarm generation, and polling whether the storage equipment has a fault or not by the server end;

and the fault elimination module is used for deleting the fault injection file of the storage device and inquiring whether the alarm signal disappears or not after the fault is generated, wherein the fault injection is successful after the fault injection file disappears, and otherwise, the injection fails.

Preferably, the Json file includes alarm event number and chassis type under the number, serial number, row number, column number, type of hardware to be injected with fault, number of hardware to be injected with fault in one chassis, chassis management protocol type, page number to be modified, number of bytes to be injected with data, modified value, and offset of value to be modified.

Preferably, the first line in the fault injection file represents a page, the second line represents the number of modified bytes, the third line represents the modified value, and the fourth line represents the offset of the value to be modified in the page.

The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:

compared with the prior art, the fault injection method has the advantages that the hardware fault injection file is generated and deleted through the server side, the storage device reads the hardware fault injection file and modifies all data of the case stored in the page to manufacture the fault, so that the generation and disappearance of various hardware faults are simulated by modifying the case state information returned by the hardware management chip, the rationality and reliability of software logic of the storage side are verified, the hardware is not relied when the software logic is verified, the software and hardware decoupling of fault injection is completed, flying wires or other physical processing of the hardware is not needed, and the time, the labor and the material resources are saved.

Drawings

FIG. 1 is a flowchart of an automated injection verification method for hardware failure of a multi-control storage device according to an embodiment of the present invention;

fig. 2 is a block diagram of an automated injection verification system for hardware failure of a multi-control storage device according to an embodiment of the present invention.

Detailed Description

In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.

The following describes a method and a system for automatic injection verification of hardware faults of a multi-control storage device according to embodiments of the present invention in detail with reference to the accompanying drawings.

As shown in FIG. 1, the invention discloses an automatic injection verification method for hardware faults of a multi-control storage device, which comprises the following operations:

constructing a Json data structure and storing different hardware fault injection information;

the method comprises the steps that a server side establishes connection with storage equipment, the type of a case of the storage equipment is obtained, Json files for storing hardware fault injection information are read according to the type of the case, and fault injection is carried out one by one;

the storage equipment generates a fault injection file according to the fault injection information, performs fault injection according to the fault injection file, waits for alarm generation, and polls whether the storage equipment generates faults or not at the server side;

and after the fault is generated, deleting the fault injection file of the storage device and inquiring whether the alarm signal disappears, wherein the fault injection is successful after the fault disappears, and otherwise, the injection fails.

The embodiment of the invention simulates the generation and disappearance of various hardware faults by modifying the chassis state information returned by the hardware management chip on the equipment using the standard chassis management protocol (SES protocol), and verifies the reasonability and reliability of the software logic of the storage end.

And generating and deleting a hardware fault injection configuration file, wherein the part of code logic is realized by using a Pytest framework in Python, and the code can run on a server which can establish connection with each node of the storage device through SSH for executing commands on each node of the storage device.

The Json data structure is designed to store fault injection information. And confirming necessary information required when generating the hardware fault injection file aiming at different hardware faults, and putting the necessary information into the Json file. Because the types of hardware in the chassis are more, different fault injection information uses the same Json structure, so that the code operated by the server side has high reusability.

The Json file is composed of a plurality of alarm event numbers, and each alarm event corresponds to a hardware fault. Each alarm event number stores information required by corresponding hardware fault injection, including a case type, a serial number, a line number, a column number, a type of hardware to be injected with a fault, the number of the hardware to be injected with a fault in one case, a case management protocol type, a page number to be modified, the number of bytes of data to be injected, a modified value and an offset of the value to be modified. The sequence number is used for acquiring hardware fault alarms, the sequence number can be increased, and 1 is added to the sequence number of the generated alarm when an alarm sequence number appears newly; the row number and the column number are used for searching for fault hardware; the management chip stores various information of the case into page pages, and the page pages follow an SES protocol.

And after the server side script is executed, calling a paramiko library function of Python, establishing connection with the storage equipment, and after the connection is successful, acquiring a case list of the storage side by the script executed by the server. Traversing the case list, obtaining the case type of a certain case, reading Json files used for fault injection, injecting the Json files one by one, and generating a fault injection file at a storage end according to the obtained fault injection information, wherein a first line in the file represents a page, a second line represents the number of modified bytes, a third line represents a modified value, and a fourth line represents the offset of the value to be modified in the page. And after the fault is generated, comparing the alarm event number of the alarm and the hardware ID with the fault to determine whether the hardware ID with the fault is the hardware ID with the fault. And recording the alarm serial number after the corresponding alarm event is generated, and executing a command at the storage end by the server end through a paramiko library function to delete the fault injection file at the storage end. Executing an alarm query instruction every few seconds within a certain upper limit: and the lseventlogerrror _ sequence _ number inquires whether the auto _ fix field is YES in the execution result, if so, the field represents that the fault disappears, records a success log, continues to execute the next hardware fault injection test case, and if the polling is on-line, the auto _ fix field is still NO, records a failure log and continues to execute the next case. And when the execution of all hardware fault cases of all the chassis is finished, the script exits.

Reading the hardware fault injection configuration file modification data, on the basis of the original case management logic, needing to add an interface on a case management module for storing software, reading the hardware fault injection file and modifying a case management chip according to the read data for fault manufacture.

And adding a switch for logic verification at a storage end by adding an environment variable. The code logic for verifying hardware failures triggers when an environment variable exists and does not trigger otherwise. And isolating the newly added interface codes so as not to influence the logic of the original software system. The logical trigger conditions of the interface are: normally operating a storage end software system; the hardware of each case has no fault alarm; the environment variables used for logic verification are validated. An EN module in software can collect page information packaged by hardware chips one by one under a certain condition, the position of a newly added interface needs to be at the position where the SCSI protocol Receive Diagnostics Results Command returns success, and the returned content of the Command can be any page. And after a certain page is obtained, reading a hardware fault injection configuration file already at a storage end, confirming whether the page is a page needing fault injection, if so, modifying the original page at the corresponding position according to the number of bytes to be modified, modification values and offsets, and finishing interface calling.

Setting a command for inquiring the alarm, wherein the used inquiry command is lseventlog, the command can inquire all alarm information, and if a specific alarm serial number is added after the command, the specific information of the alarm can be displayed. The alarm information includes auto _ fix field, which records whether the alarm has disappeared, no represents that the fault still exists, yes represents that the hardware fault disappeared, and a two-bit array, which records the location of the faulty device.

The invention generates and deletes the hardware fault injection file through the server side, reads the hardware fault injection file through the storage device and modifies various data of the case stored in the page, thereby simulating the generation and disappearance of various hardware faults through modifying the case state information returned by the hardware management chip, verifying the rationality and reliability of the software logic of the storage side, completing the software and hardware decoupling of fault injection without depending on the hardware when verifying the software logic, and saving time, manpower and material resources without performing flying wire or other physical processing on the hardware.

As shown in fig. 2, an embodiment of the present invention further discloses an automated injection verification system for hardware faults of a multi-control storage device, where the system includes:

the fault injection data construction module is used for constructing a Json data structure and storing different hardware fault injection information;

the connection establishing module is used for establishing connection with the storage equipment at the server side, acquiring the type of a case of the storage equipment, reading a Json file for storing hardware fault injection information according to the type of the case, and injecting the hardware fault injection information one by one;

the fault injection module is used for generating a fault injection file according to the fault injection information by the storage equipment, performing fault injection by the storage equipment according to the fault injection file, waiting for alarm generation, and polling whether the storage equipment has a fault or not by the server end;

and the fault elimination module is used for deleting the fault injection file of the storage device and inquiring whether the alarm signal disappears or not after the fault is generated, wherein the fault injection is successful after the fault injection file disappears, and otherwise, the injection fails.

The embodiment of the invention simulates the generation and disappearance of various hardware faults by modifying the chassis state information returned by the hardware management chip on the equipment using the standard chassis management protocol (SES protocol), and verifies the reasonability and reliability of the software logic of the storage end.

And generating and deleting a hardware fault injection configuration file, wherein the part of code logic is realized by using a Pytest framework in Python, and the code can run on a server which can establish connection with each node of the storage device through SSH for executing commands on each node of the storage device.

The Json data structure is designed to store fault injection information. And confirming necessary information required when generating the hardware fault injection file aiming at different hardware faults, and putting the necessary information into the Json file. Because the types of hardware in the chassis are more, different fault injection information uses the same Json structure, so that the code operated by the server side has high reusability.

The Json file is composed of a plurality of alarm event numbers, and each alarm event corresponds to a hardware fault. Each alarm event number stores information required by corresponding hardware fault injection, including a case type, a serial number, a line number, a column number, a type of hardware to be injected with a fault, the number of the hardware to be injected with a fault in one case, a case management protocol type, a page number to be modified, the number of bytes of data to be injected, a modified value and an offset of the value to be modified. The sequence number is used for acquiring hardware fault alarms, the sequence number can be increased, and 1 is added to the sequence number of the generated alarm when an alarm sequence number appears newly; the row number and the column number are used for searching for fault hardware; the management chip stores various information of the case into page pages, and the page pages follow an SES protocol.

And after the server side script is executed, calling a paramiko library function of Python, establishing connection with the storage equipment, and after the connection is successful, acquiring a case list of the storage side by the script executed by the server. Traversing the case list, obtaining the case type of a certain case, reading Json files used for fault injection, injecting the Json files one by one, and generating a fault injection file at a storage end according to the obtained fault injection information, wherein a first line in the file represents a page, a second line represents the number of modified bytes, a third line represents a modified value, and a fourth line represents the offset of the value to be modified in the page. And after the fault is generated, comparing the alarm event number of the alarm and the hardware ID with the fault to determine whether the hardware ID with the fault is the hardware ID with the fault. And recording the alarm serial number after the corresponding alarm event is generated, and executing a command at the storage end by the server end through a paramiko library function to delete the fault injection file at the storage end. Executing an alarm query instruction every few seconds within a certain upper limit: and the lseventlogerrror _ sequence _ number inquires whether the auto _ fix field is YES in the execution result, if so, the field represents that the fault disappears, records a success log, continues to execute the next hardware fault injection test case, and if the polling is on-line, the auto _ fix field is still NO, records a failure log and continues to execute the next case. And when the execution of all hardware fault cases of all the chassis is finished, the script exits.

Reading the hardware fault injection configuration file modification data, on the basis of the original case management logic, needing to add an interface on a case management module for storing software, reading the hardware fault injection file and modifying a case management chip according to the read data for fault manufacture.

And adding a switch for logic verification at a storage end by adding an environment variable. The code logic for verifying hardware failures triggers when an environment variable exists and does not trigger otherwise. And isolating the newly added interface codes so as not to influence the logic of the original software system. The logical trigger conditions of the interface are: normally operating a storage end software system; the hardware of each case has no fault alarm; the environment variables used for logic verification are validated. An EN module in software can collect page information packaged by hardware chips one by one under a certain condition, the position of a newly added interface needs to be at the position where the SCSI protocol Receive Diagnostics Results Command returns success, and the returned content of the Command can be any page. And after a certain page is obtained, reading a hardware fault injection configuration file already at a storage end, confirming whether the page is a page needing fault injection, if so, modifying the original page at the corresponding position according to the number of bytes to be modified, modification values and offsets, and finishing interface calling.

Setting a command for inquiring the alarm, wherein the used inquiry command is lseventlog, the command can inquire all alarm information, and if a specific alarm serial number is added after the command, the specific information of the alarm can be displayed. The alarm information includes auto _ fix field, which records whether the alarm has disappeared, no represents that the fault still exists, yes represents that the hardware fault disappeared, and a two-bit array, which records the location of the faulty device.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于无服务器架构的物联网数据异常检测方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!