Distributed storage slow disk processing method, system, terminal and storage medium

文档序号:1815419 发布日期:2021-11-09 浏览:6次 中文

阅读说明:本技术 分布式存储的慢盘处理方法、系统、终端及存储介质 (Distributed storage slow disk processing method, system, terminal and storage medium ) 是由 赵闪闪 于 2021-06-25 设计创作,主要内容包括:本发明提供一种分布式存储的慢盘处理方法、系统、终端及存储介质,包括:对分布式存储集群进行慢盘检测,筛选并标记慢盘;判断对已标记的慢盘进行隔离是否超出集群故障域:若是,则上报慢盘故障告警;若否,则上报慢盘故障告警并对所述慢盘进行隔离。本发明通过对检测出的慢盘进行标记,并对已标记的慢盘进行故障域规则判断,判断若对已标记的慢盘进行隔离是否符合故障域,若不符合故障域则仅生成慢盘的故障告警而不对慢盘进行隔离,由此可以避免直接隔离慢盘导致的分布式文件系统可用性降低,避免了分布式文件系统的故障扩大,实现了在解决分布式文件系统中慢盘对系统性能的不利影响的同时,又能保证分布式文件系统的高可用性。(The invention provides a distributed storage slow disk processing method, a system, a terminal and a storage medium, comprising the following steps: slow disk detection is carried out on the distributed storage cluster, and slow disks are screened and marked; judging whether the marked slow disk is isolated and exceeds a cluster fault domain: if yes, reporting a slow disc fault alarm; and if not, reporting a slow disk fault alarm and isolating the slow disk. According to the method and the device, the detected slow disk is marked, the marked slow disk is subjected to fault domain rule judgment, whether the marked slow disk is isolated to meet the fault domain is judged, and if the marked slow disk is not isolated to meet the fault domain, only the fault alarm of the slow disk is generated without isolating the slow disk, so that the reduction of the availability of the distributed file system caused by directly isolating the slow disk can be avoided, the fault expansion of the distributed file system is avoided, the adverse effect of the slow disk on the system performance in the distributed file system is solved, and the high availability of the distributed file system can be ensured.)

1. A slow disk processing method for distributed storage is characterized by comprising the following steps:

slow disk detection is carried out on the distributed storage cluster, and slow disks are screened and marked;

judging whether the marked slow disk is isolated and exceeds a cluster fault domain:

if yes, reporting a slow disc fault alarm;

and if not, reporting a slow disk fault alarm and isolating the slow disk.

2. The method of claim 1, wherein performing slow disk detection, screening and marking slow disks for distributed storage clusters comprises:

setting detection times, a time delay value and a threshold value;

performing cycle detection on the disks of the distributed storage cluster according to the detection times, and recording the actual time delay value of the disks obtained by each detection;

comparing the actual time delay value of the disk with a set time delay value, and counting the current detection of the disk as effective 1 time if the actual time delay value reaches the time delay value;

and counting the total effective times of the magnetic disk after the circulation detection is finished, and if the total effective times exceed the threshold value, judging that the magnetic disk is a slow disk.

3. The method of claim 1, wherein determining whether isolating the marked slow disk exceeds a cluster failure domain comprises:

acquiring the number of fault nodes allowed by a fault domain of a copy pool where a slow disk is located;

acquiring a cluster running state, and acquiring an abnormal type of a cluster if the cluster state is abnormal;

if the abnormal type is the node fault, judging whether the number of the fault nodes reaches the allowed number of the fault nodes, and if the number of the fault nodes reaches the allowed number of the fault nodes, not isolating the slow disk;

if the abnormal type is a disk fault, whether the slow disk and the fault disk are in the same placing group is judged, and if yes, the slow disk is not isolated.

4. The method of claim 3, wherein obtaining the number of fault nodes allowed by the fault domain of the copy pool where the slow disk is located comprises:

pre-storing the number of allowed fault nodes corresponding to the multi-type copy pools;

and acquiring the type of a copy pool where the slow disk is positioned, and searching the number of the allowed fault nodes corresponding to the slow disk according to the type of the copy pool.

5. A slow disk processing system for distributed storage, comprising:

the slow disc detection unit is used for carrying out slow disc detection on the distributed storage cluster, screening and marking slow discs;

the rule judging unit is used for judging whether the marked slow disk is isolated to exceed a cluster fault domain;

the alarm reporting unit is used for reporting a slow disk fault alarm if the marked slow disk is isolated and exceeds a cluster fault domain;

and the slow disk isolation unit is used for reporting a slow disk fault alarm and isolating the slow disk if the marked slow disk is isolated and does not exceed the cluster fault domain.

6. The system of claim 5, wherein the slow disc detection unit comprises:

the parameter setting module is used for setting detection times, a time delay value and a threshold value;

the circular detection module is used for circularly detecting the disks of the distributed storage cluster according to the detection times and recording the actual time delay value of the disks obtained by each detection;

the effective counting module is used for comparing the actual time delay value of the disk with a set time delay value, and counting the current detection of the disk as 1 time which is effective if the actual time delay value reaches the time delay value;

and the threshold comparison module is used for counting the total effective times of the magnetic disk after the circular detection is finished, and if the total effective times exceed the threshold value, judging that the magnetic disk is a slow disk.

7. The system according to claim 5, wherein the rule judging unit includes:

the system comprises a standard acquisition module, a failure domain acquisition module and a failure domain management module, wherein the standard acquisition module is used for acquiring the number of failure nodes allowed by a failure domain of a copy pool where a slow disk is located;

the abnormal acquisition module is used for acquiring the running state of the cluster, and acquiring the abnormal type of the cluster if the cluster state is abnormal;

the first processing module is used for judging whether the number of the fault nodes reaches the allowed number of the fault nodes or not if the abnormal type is the node fault, and not isolating the slow disk if the number of the fault nodes reaches the allowed number of the fault nodes;

and the second processing module is used for judging whether the slow disk and the fault disk are in the same placing group or not if the abnormal type is the disk fault, and if so, not isolating the slow disk.

8. The system of claim 7, wherein the criteria acquisition module comprises:

the type pre-storage submodule is used for pre-storing the number of the allowed fault nodes corresponding to the multi-type copy pools;

and the quantity searching submodule is used for acquiring the type of the copy pool where the slow disk is positioned and searching the quantity of the allowed fault nodes corresponding to the slow disk according to the type of the copy pool.

9. A terminal, comprising:

a processor;

a memory for storing instructions for execution by the processor;

wherein the processor is configured to perform the method of any one of claims 1-4.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-4.

Technical Field

The invention relates to the technical field of distributed storage, in particular to a slow disk processing method, a system, a terminal and a storage medium for distributed storage.

Background

Distributed File System (DFS) means that physical storage resources managed by a File System are not necessarily directly connected to a local node, but are connected to a node (which may be simply understood as a computer) through a computer network; or a complete hierarchical file system formed by combining several different logical disk partitions or volume labels. DFS provides a logical tree file system structure for resources distributed at any position on the network, so that users can access shared files distributed on the network more conveniently. The role of an individual DFS shared folder is relative to the access points through other shared folders on the network.

A slow disk in a distributed file system refers to a hard disk with lower performance in a storage system. The presence of a "slow disk" can affect the performance of the RAID group and even the entire business system. In order to stabilize the performance of the storage system, it is necessary to perform slow disk detection on the distributed file system in time and replace the detected slow disk. Once the slow disk is detected, the current distributed file system isolates the slow disk to reduce the influence on the system performance.

However, although the slow disk has a certain effect on the system performance, if the slow disk is isolated without distinction, the availability of the distributed file system is adversely affected, and the stability of the distributed file system is damaged.

Disclosure of Invention

In view of the above defects in the prior art, the present invention provides a slow disk processing method, system, terminal and storage medium for distributed storage, so as to solve the technical problem of the stability degradation of a distributed file system caused by directly isolating a slow disk.

In a first aspect, the present invention provides a slow disc processing method for distributed storage, including:

slow disk detection is carried out on the distributed storage cluster, and slow disks are screened and marked;

judging whether the marked slow disk is isolated and exceeds a cluster fault domain:

if yes, reporting a slow disc fault alarm;

and if not, reporting a slow disk fault alarm and isolating the slow disk.

Further, slow disk detection, screening and marking of slow disks for the distributed storage cluster includes:

setting detection times, a time delay value and a threshold value;

performing cycle detection on the disks of the distributed storage cluster according to the detection times, and recording the actual time delay value of the disks obtained by each detection;

comparing the actual time delay value of the disk with a set time delay value, and counting the current detection of the disk as effective 1 time if the actual time delay value reaches the time delay value;

and counting the total effective times of the magnetic disk after the circulation detection is finished, and if the total effective times exceed the threshold value, judging that the magnetic disk is a slow disk.

Further, determining whether isolating the marked slow disk exceeds a cluster fault domain includes:

acquiring the number of fault nodes allowed by a fault domain of a copy pool where a slow disk is located;

acquiring a cluster running state, and acquiring an abnormal type of a cluster if the cluster state is abnormal;

if the abnormal type is the node fault, judging whether the number of the fault nodes reaches the allowed number of the fault nodes, and if the number of the fault nodes reaches the allowed number of the fault nodes, not isolating the slow disk;

if the abnormal type is a disk fault, whether the slow disk and the fault disk are in the same placing group is judged, and if yes, the slow disk is not isolated.

Further, acquiring the number of fault nodes allowed by the fault domain of the copy pool where the slow disk is located includes:

pre-storing the number of allowed fault nodes corresponding to the multi-type copy pools;

and acquiring the type of a copy pool where the slow disk is positioned, and searching the number of the allowed fault nodes corresponding to the slow disk according to the type of the copy pool.

In a second aspect, the present invention provides a slow disk processing system for distributed storage, comprising:

the slow disc detection unit is used for carrying out slow disc detection on the distributed storage cluster, screening and marking slow discs;

the rule judging unit is used for judging whether the marked slow disk is isolated to exceed a cluster fault domain;

the alarm reporting unit is used for reporting a slow disk fault alarm if the marked slow disk is isolated and exceeds a cluster fault domain;

and the slow disk isolation unit is used for reporting a slow disk fault alarm and isolating the slow disk if the marked slow disk is isolated and does not exceed the cluster fault domain.

Further, the slow disc detection unit includes:

the parameter setting module is used for setting detection times, a time delay value and a threshold value;

the circular detection module is used for circularly detecting the disks of the distributed storage cluster according to the detection times and recording the actual time delay value of the disks obtained by each detection;

the effective counting module is used for comparing the actual time delay value of the disk with a set time delay value, and counting the current detection of the disk as 1 time which is effective if the actual time delay value reaches the time delay value;

and the threshold comparison module is used for counting the total effective times of the magnetic disk after the circular detection is finished, and if the total effective times exceed the threshold value, judging that the magnetic disk is a slow disk.

Further, the rule determination unit includes:

the system comprises a standard acquisition module, a failure domain acquisition module and a failure domain management module, wherein the standard acquisition module is used for acquiring the number of failure nodes allowed by a failure domain of a copy pool where a slow disk is located;

the abnormal acquisition module is used for acquiring the running state of the cluster, and acquiring the abnormal type of the cluster if the cluster state is abnormal;

the first processing module is used for judging whether the number of the fault nodes reaches the allowed number of the fault nodes or not if the abnormal type is the node fault, and not isolating the slow disk if the number of the fault nodes reaches the allowed number of the fault nodes;

and the second processing module is used for judging whether the slow disk and the fault disk are in the same placing group or not if the abnormal type is the disk fault, and if so, not isolating the slow disk.

Further, the standard obtaining module includes:

the type pre-storage submodule is used for pre-storing the number of the allowed fault nodes corresponding to the multi-type copy pools;

and the quantity searching submodule is used for acquiring the type of the copy pool where the slow disk is positioned and searching the quantity of the allowed fault nodes corresponding to the slow disk according to the type of the copy pool.

In a third aspect, a terminal is provided, including:

a processor, a memory, wherein,

the memory is used for storing a computer program which,

the processor is used for calling and running the computer program from the memory so as to make the terminal execute the method of the terminal.

In a fourth aspect, a computer storage medium is provided having stored therein instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.

The beneficial effect of the invention is that,

according to the slow disk processing method for distributed storage, provided by the invention, the detected slow disk is marked, the fault domain rule judgment is carried out on the marked slow disk, whether the marked slow disk is isolated to accord with the fault domain or not is judged, and if the marked slow disk is not isolated to accord with the fault domain, only the fault alarm of the slow disk is generated without isolating the slow disk, so that the reduction of the availability of a distributed file system caused by directly isolating the slow disk can be avoided, the fault expansion of the distributed file system is avoided, and the problem of adverse influence of the slow disk in the distributed file system on the system performance is solved, and the high availability of the distributed file system can be ensured.

According to the distributed storage slow disk processing system provided by the invention, the detected slow disk is marked by the slow disk detection unit, the fault domain rule judgment is carried out on the marked slow disk by the rule judgment unit, whether the marked slow disk is isolated to accord with the fault domain is judged, and if the marked slow disk is not isolated to accord with the fault domain, only the fault alarm of the slow disk is generated without isolating the slow disk, so that the problem of low availability of a distributed file system caused by directly isolating the slow disk can be avoided, the problem of fault expansion of the distributed file system is avoided, the problem of adverse effect of the slow disk in the distributed file system on the system performance is solved, and the high availability of the distributed file system can be ensured.

The terminal marks the detected slow disk and judges the fault domain rule of the marked slow disk, judges whether the marked slow disk is in accordance with the fault domain if the marked slow disk is isolated, and only generates the fault alarm of the slow disk but not isolates the slow disk if the marked slow disk is not in accordance with the fault domain, so that the usability of the distributed file system is prevented from being reduced due to the fact that the slow disk is directly isolated, the fault expansion of the distributed file system is avoided, the problem of adverse effect of the slow disk in the distributed file system on the performance of the distributed file system is solved, and meanwhile the high usability of the distributed file system is ensured.

The storage medium provided by the invention stores a program for executing a slow disk processing method of distributed storage, and by marking the detected slow disk and judging the fault domain rule of the marked slow disk, whether the marked slow disk is isolated to accord with the fault domain is judged, and if the marked slow disk is not isolated to accord with the fault domain, only a fault alarm of the slow disk is generated without isolating the slow disk, so that the usability of a distributed file system is prevented from being reduced due to the direct isolation of the slow disk, the fault expansion of the distributed file system is prevented, the adverse effect of the slow disk in the distributed file system on the system performance is solved, and the high usability of the distributed file system is ensured.

In addition, the invention has reliable design principle, simple structure and very wide application prospect.

Drawings

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.

FIG. 2 is another schematic flow diagram of a method of one embodiment of the invention.

FIG. 3 is a schematic block diagram of a system of one embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The following explains key terms appearing in the present invention.

And a fault domain, which is formed by creating the fault domain and distributing one or more nodes to the fault domain to group the nodes which can simultaneously fail. The failure of all nodes in a single failure domain will be considered to be the same failure. If a fault domain is formulated, multiple copies of the same object will not be placed in the same fault domain.

FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention. Wherein, the execution subject of fig. 1 can be a slow disk processing system of distributed storage.

As shown in fig. 1, the method includes:

step 110, slow disk detection is carried out on the distributed storage cluster, and slow disks are screened and marked;

step 120, judging whether the marked slow disk is isolated and exceeds the cluster fault domain:

step 130, if yes, reporting a slow disc fault alarm;

and 140, if not, reporting a slow disk fault alarm and isolating the slow disk.

In order to facilitate understanding of the present invention, the following further describes the slow disk processing method for distributed storage according to the principle of the slow disk processing method for distributed storage of the present invention, in combination with the process of processing the slow disk for distributed storage in the embodiment.

After the slow disk detection switch is turned on, the system enters a slow disk detection process, after the disks meeting the conditions are marked as slow disks, the fault domain of the cluster is judged, only slow disk alarm reminding is performed on the slow disks which can exceed the fault domain, and the slow disks which do not exceed the fault domain are isolated while the alarm is reported, so that the high performance and the availability of the storage system are ensured.

Specifically, referring to fig. 2, the slow disk processing method for distributed storage includes:

and S1, performing slow disk detection on the distributed storage cluster, screening and marking slow disks.

Setting detection times, a time delay value and a threshold value; performing cycle detection on the disks of the distributed storage cluster according to the detection times, and recording the actual time delay value of the disks obtained by each detection; comparing the actual time delay value of the disk with the set time delay value, and counting the current detection of the disk as effective 1 time if the actual time delay value reaches the time delay value; and counting the effective total times of the magnetic disk after the circulation detection is finished, and if the effective total times exceed a threshold value, judging the magnetic disk to be a slow disk.

Specifically, a slow disc detection switch is turned on, and detection times, a time delay value and a threshold value are set; slow disc detection is carried out on all the magnetic discs in the distributed cluster, detection is carried out according to the set detection times, results are recorded (when the actual time delay reaches a set time delay value, the data is considered to be valid), and when the number of valid data accumulated in one period (when the detection times is reached, the data is considered to be one period) exceeds a threshold value, a slow disc fault is considered to occur; examples are as follows: the detection times are 20, the time delay is 1000ms, and the threshold value is 18, that is, the number of times that the time delay reaches 1000ms in the continuous 20 detections exceeds 18 times, the disk is considered to be a slow disk.

The detected slow disk is marked, and the detailed information of the marked slow disk, such as the hard disk number, the copy pool, the grouping in the fault domain and the like, is obtained.

S2, judging whether the marked slow disk is isolated beyond the cluster fault domain: if yes, reporting a slow disc fault alarm; and if not, reporting a slow disk fault alarm and isolating the slow disk.

Pre-storing the number of allowed fault nodes corresponding to the multi-type copy pools; and acquiring the type of a copy pool where the slow disk is positioned, and searching the number of the allowed fault nodes corresponding to the slow disk according to the type of the copy pool. Acquiring the number of fault nodes allowed by a fault domain of a copy pool where a slow disk is located; acquiring a cluster running state, and acquiring an abnormal type of a cluster if the cluster state is abnormal; if the abnormal type is a node fault, judging whether the number of fault nodes reaches the allowed number of fault nodes, if so, not isolating the slow disk, if not, further judging the number of the slow disk, the number of the fault nodes and the allowed number of the fault nodes, namely, whether the sum of the number of the slow disk and the number of the fault nodes exceeds the allowed number of the fault nodes, if not, isolating the slow disk, otherwise, not isolating the slow disk; if the abnormal type is a disk fault, whether the slow disk and the fault disk are in the same placing group is judged, and if yes, the slow disk is not isolated.

Specifically, when a slow disk fault is detected, before the slow disk is isolated, a cluster fault domain is judged, and if the slow disk isolation (which is equal to removing a disk from a cluster) exceeds the fault domain of the cluster, only a slow disk fault alarm is reported, and the disk isolation operation is not performed; if the slow disk isolation does not exceed the fault domain, the isolation operation of the disk is completed while the slow disk fault alarm is reported.

And a fault domain judging mechanism, wherein for the copy pool with the copy number of N, the system allows at most N/2 (integer downwards) nodes to have faults at the same time. For the erasure pool of the K + M rule, the system allows at most M nodes to fail simultaneously. Therefore, the corresponding allowed number of the fault nodes is obtained according to the copy pool type of the slow disk.

Taking a two-copy pool as an example, in the cluster of the two-copy strategy, the maximum allowable number of fault nodes is 1, and if the cluster state is normal, slow disk isolation is allowed; if the cluster state is abnormal, whether the abnormal state is a node fault or a disk fault needs to be judged, if the abnormal state is the node fault, slow disk isolation is not allowed, and because the node fault exists, any superposed faults can exceed a fault domain; if the failure of the magnetic disk is detected, judging whether the failed magnetic disk and the magnetic disk detected to have the failure of the slow disk belong to the same placing group, if the failure of the magnetic disk and the magnetic disk detected to have the failure of the slow disk belong to the same placing group, not performing slow disk isolation, and if the failure of the slow disk does not belong to the same placing group, allowing the slow disk isolation to be performed; (two disks in the same group are placed, one disk stores original data, and the other disk stores backup data, and if the two disks fail at the same time, the failure domain exceeds).

Taking a 2+1 erasure pool as an example, in the cluster of the 2+1 erasure strategy, the maximum allowable number of fault nodes is 1, and if the cluster state is normal, slow disk isolation is allowed; if the cluster state is abnormal, whether the abnormal state is a node fault or a disk fault needs to be judged, if the abnormal state is the node fault, slow disk isolation is not allowed, and because the node fault exists, any superposed faults can exceed a fault domain; if the failure of the magnetic disk is detected, judging whether the failed magnetic disk and the magnetic disk detected to have the failure of the slow disk belong to the same placing group, if the failure of the magnetic disk and the magnetic disk detected to have the failure of the slow disk belong to the same placing group, not performing slow disk isolation, and if the failure of the slow disk does not belong to the same placing group, allowing the slow disk isolation to be performed; (three disks in the same set, two disks store data information, one disk stores check information, only one disk is supported in the three disks, and the two disks and the above faults can cause the fault domain to exceed).

In this embodiment, after a disk in a storage system is detected and confirmed as a slow disk, a failure domain judgment mechanism is started, and according to different storage pool policies in a cluster, whether the slow disk exceeds a failure domain after being isolated is judged, only a slow disk alarm prompt is performed on the slow disk that exceeds the failure domain, and when an alarm is reported on the slow disk that does not exceed the failure domain, the slow disk is isolated, so that high performance and availability of the storage system are ensured.

As shown in fig. 3, the system 300 includes:

the slow disc detection unit is used for carrying out slow disc detection on the distributed storage cluster, screening and marking slow discs;

the rule judging unit is used for judging whether the marked slow disk is isolated to exceed a cluster fault domain;

the alarm reporting unit is used for reporting a slow disk fault alarm if the marked slow disk is isolated and exceeds a cluster fault domain;

and the slow disk isolation unit is used for reporting a slow disk fault alarm and isolating the slow disk if the marked slow disk is isolated and does not exceed the cluster fault domain.

Optionally, as an embodiment of the present invention, the slow disc detection unit includes:

the parameter setting module is used for setting detection times, a time delay value and a threshold value;

the circular detection module is used for circularly detecting the disks of the distributed storage cluster according to the detection times and recording the actual time delay value of the disks obtained by each detection;

the effective counting module is used for comparing the actual time delay value of the disk with a set time delay value, and counting the current detection of the disk as 1 time which is effective if the actual time delay value reaches the time delay value;

and the threshold comparison module is used for counting the total effective times of the magnetic disk after the circular detection is finished, and if the total effective times exceed the threshold value, judging that the magnetic disk is a slow disk.

Optionally, as an embodiment of the present invention, the rule determining unit includes:

the system comprises a standard acquisition module, a failure domain acquisition module and a failure domain management module, wherein the standard acquisition module is used for acquiring the number of failure nodes allowed by a failure domain of a copy pool where a slow disk is located;

the abnormal acquisition module is used for acquiring the running state of the cluster, and acquiring the abnormal type of the cluster if the cluster state is abnormal;

the first processing module is used for judging whether the number of the fault nodes reaches the allowed number of the fault nodes or not if the abnormal type is the node fault, and not isolating the slow disk if the number of the fault nodes reaches the allowed number of the fault nodes;

and the second processing module is used for judging whether the slow disk and the fault disk are in the same placing group or not if the abnormal type is the disk fault, and if so, not isolating the slow disk.

Optionally, as an embodiment of the present invention, the standard obtaining module includes:

the type pre-storage submodule is used for pre-storing the number of the allowed fault nodes corresponding to the multi-type copy pools;

and the quantity searching submodule is used for acquiring the type of the copy pool where the slow disk is positioned and searching the quantity of the allowed fault nodes corresponding to the slow disk according to the type of the copy pool.

According to the distributed storage slow disk processing system provided by the invention, the detected slow disk is marked by the slow disk detection unit, the fault domain rule judgment is carried out on the marked slow disk by the rule judgment unit, whether the marked slow disk is isolated to accord with the fault domain is judged, and if the marked slow disk is not isolated to accord with the fault domain, only the fault alarm of the slow disk is generated without isolating the slow disk, so that the problem of low availability of a distributed file system caused by directly isolating the slow disk can be avoided, the problem of fault expansion of the distributed file system is avoided, the problem of adverse effect of the slow disk in the distributed file system on the system performance is solved, and the high availability of the distributed file system can be ensured.

Fig. 4 is a schematic structural diagram of a terminal 400 according to an embodiment of the present invention, where the terminal 400 may be used to execute the slow disk processing method for distributed storage according to the embodiment of the present invention.

Among them, the terminal 400 may include: a processor 410, a memory 420, and a communication unit 430. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not intended to be limiting, and may be a bus architecture, a star architecture, a combination of more or less components than those shown, or a different arrangement of components.

The memory 420 may be used for storing instructions executed by the processor 410, and the memory 420 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The executable instructions in memory 420, when executed by processor 410, enable terminal 400 to perform some or all of the steps in the method embodiments described below.

The processor 410 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 420 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 410 may include only a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.

A communication unit 430, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.

The terminal marks the detected slow disk and judges the fault domain rule of the marked slow disk, judges whether the marked slow disk is in accordance with the fault domain if the marked slow disk is isolated, and only generates the fault alarm of the slow disk but not isolates the slow disk if the marked slow disk is not in accordance with the fault domain, so that the usability of the distributed file system is prevented from being reduced due to the fact that the slow disk is directly isolated, the fault expansion of the distributed file system is avoided, the problem of adverse effect of the slow disk in the distributed file system on the performance of the distributed file system is solved, and meanwhile the high usability of the distributed file system is ensured.

The present invention also provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

Therefore, the invention marks the detected slow disk, judges the fault domain rule of the marked slow disk, judges whether the marked slow disk is isolated to accord with the fault domain, and only generates the fault alarm of the slow disk but does not isolate the slow disk if the marked slow disk is not isolated to accord with the fault domain, thereby avoiding the reduction of the availability of the distributed file system caused by directly isolating the slow disk, avoiding the fault expansion of the distributed file system, realizing the solution of the adverse effect of the slow disk in the distributed file system on the system performance, and simultaneously ensuring the high availability of the distributed file system.

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, where the computer software product is stored in a storage medium, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like, and the storage medium can store program codes, and includes instructions for enabling a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, and the like) to perform all or part of the steps of the method in the embodiments of the present invention.

The same and similar parts in the various embodiments in this specification may be referred to each other. Especially, for the terminal embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description in the method embodiment.

In the embodiments provided in the present invention, it should be understood that the disclosed system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种实现存储集群仲裁的方法、系统及计算机设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类