Time sequence database cluster and fault processing and operating method and device thereof

文档序号：1952393 发布日期：2021-12-10 浏览：8次中文

阅读说明：本技术 时序数据库集群及其故障处理、操作方法、及装置 (Time sequence database cluster and fault processing and operating method and device thereof ) 是由杨冠飞于 2021-08-17 设计创作，主要内容包括：本申请涉及一种时序数据库集群及其故障处理方法、操作方法、及装置,时序数据库集群包括：至少一个元数据节点、多个数据节点；多个数据节点互为副本；元数据节点用于存储元数据,元数据至少包括数据节点的主机名；方法包括：当检测到任一数据节点发生故障时,新建虚拟节点；将发生故障的数据节点的配置文件拷贝至虚拟节点,并将发生故障的数据节点的数据盘挂载至虚拟节点；在时序数据库集群的主机名映射文件中,将目标映射关系中的IP地址更新为虚拟节点的IP地址,使得虚拟节点替代发生故障的数据节点提供数据服务。由此可以实现当时序数据库集群中任一数据节点发生故障,进行故障转移,且无需修改元数据,使得时序数据库集群高可用。(The application relates to a time sequence database cluster and a fault processing method, an operation method and a device thereof, wherein the time sequence database cluster comprises: at least one metadata node, a plurality of data nodes; the data nodes are copies of each other; the metadata node is used for storing metadata, and the metadata at least comprises a host name of the data node; the method comprises the following steps: when any data node is detected to be out of order, a virtual node is newly established; copying a configuration file of the failed data node to the virtual node, and mounting a data disk of the failed data node to the virtual node; and updating the IP address in the target mapping relation into the IP address of the virtual node in the host name mapping file of the time sequence database cluster, so that the virtual node replaces the failed data node to provide data service. Therefore, when any data node in the time sequence database cluster fails, the fault transfer can be carried out, and the metadata does not need to be modified, so that the time sequence database cluster is highly available.)

1. A method for fault handling in a time series database cluster, the time series database cluster comprising: at least one metadata node, a plurality of data nodes; the data nodes are used for storing time sequence data, and the data nodes are copies of each other; the metadata node is used for storing metadata of the time sequence database cluster, and the metadata at least comprises a host name of each data node; the method comprises the following steps:

when any data node is detected to be out of order, a virtual node is newly established;

copying the configuration file of the failed data node to the virtual node, and mounting the data disk of the failed data node to the virtual node;

updating an IP address in a target mapping relation into the IP address of the virtual node in a host name mapping file of the time sequence database cluster, wherein the target mapping relation refers to a mapping relation corresponding to the data node with a fault;

restarting the sequential database cluster to enable the virtual node to provide data services in place of the failed data node.

2. The method of claim 1, wherein the temporal database cluster further comprises a keep-alive component configured to periodically send keep-alive messages to each of the data nodes;

the detecting that any one of the data nodes fails comprises:

for each data node, if a response message returned by the data node based on the alive detection message is not received within the current timeout time, determining that the data node is in a fault, wherein the current timeout time refers to: taking the time when the activity detection component sends the activity detection message last time as an initial moment and a set time period as a time period; alternatively, the first and second electrodes may be,

and for each data node, if the heartbeat message sent by the data node is not received when a set period is reached, determining that the data node fails.

3. A method of operating a time series database cluster, the method being applied to a time series database cluster according to any one of claims 1 to 2, the method comprising:

when a data operation request is received, determining a target data node to be operated from the time sequence database cluster;

acquiring the host name of the target data node from the metadata;

determining a target IP address corresponding to the acquired host name from the host name mapping file;

and executing data operation corresponding to the data operation request on the target data node according to the target IP address and the target time sequence data to be operated.

4. The method of claim 3, wherein determining a target data node to operate on from the timing database cluster comprises:

determining a target shardggroup to which the target time sequence data belongs based on a timestamp of the target time sequence data, wherein the timestamp is located in a time range formed by a starting time and an ending time of the target shardggroup;

determining all the shards contained in the target shardggroup;

and determining the data node bound with the board as a data node to be operated.

5. A time series database cluster, characterized in that it comprises at least: at least one metadata node, a plurality of data nodes, a heuristics component, and a virtual node creation component;

the data nodes are used for storing time sequence data, and the data nodes are copies of each other;

the metadata node is used for storing metadata of the time sequence database cluster, and the metadata at least comprises a host name of each data node;

the activity detection component is used for detecting the state of the data node, wherein the state comprises normal and fault;

the virtual node creating component is used for creating a virtual node when the activity detecting component detects that any data node fails; copying the configuration file of the failed data node to the virtual node, and mounting the data disk of the failed data node to the virtual node; updating an IP address in a target mapping relation into the IP address of the virtual node in a host name mapping file of the time sequence database cluster, wherein the target mapping relation refers to a mapping relation corresponding to the data node with a fault; and restarting the time-series database cluster to enable the virtual node to replace the failed data node to provide data service.

6. The cluster of claim 5, wherein the data disks are cloud disks.

7. A failure handling apparatus of a time series database cluster, the time series database cluster comprising: at least one metadata node, a plurality of data nodes; the data nodes are used for storing time sequence data, and the data nodes are copies of each other; the metadata node is used for storing metadata of the time sequence database cluster, and the metadata at least comprises a host name of each data node; the device comprises:

the node newly building module is used for newly building a virtual node when detecting that any data node has a fault;

the node configuration module is used for copying the configuration file of the failed data node to the virtual node and mounting the data disk of the failed data node to the virtual node;

the mapping module is used for updating an IP address in a target mapping relation into the IP address of the virtual node in a host name mapping file of the time sequence database cluster, wherein the target mapping relation refers to the mapping relation corresponding to the data node with the fault;

and the restarting module is used for restarting the time sequence database cluster so as to enable the virtual node to replace the failed data node to provide data service.

8. An operating device for a time series database cluster, which is applied to the time series database cluster according to any one of claims 5 to 6, the device comprising:

the target determining module is used for determining a target data node to be operated from the time sequence database cluster when a data operation request is received;

a first obtaining module, configured to obtain a host name of the target data node from the metadata;

the second acquisition module is used for determining a target IP address corresponding to the acquired host name from the host name mapping file;

and the data operation module is used for executing data operation corresponding to the data operation request on the target data node according to the target IP address and the target time sequence data to be operated.

9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the method of any one of claims 1 to 4 when executing a program stored in the memory.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.

Technical Field

The present application relates to the field of time sequence databases, and in particular, to a time sequence database cluster, and a fault processing method, an operation method, and an apparatus thereof.

Background

The Time Series Data refers to Time Series Data (Time Series Data for short), is a Data sequence in which Data of the same index is recorded in Time sequence, and is commonly present in IT infrastructure, an operation and maintenance monitoring system, and the internet of things. Accordingly, a time series database is a specific type of database, which is used primarily to store time series data.

In practice, implementing a sequential database cluster scheme is a very complicated task, and at least needs to achieve a high availability goal, that is, when a data node in the cluster fails, the cluster has a self-healing capability.

Disclosure of Invention

The application provides a time sequence database cluster and a fault processing method, an operation method and a device thereof, which are used for improving the reliability and the stability of the time sequence database cluster.

In a first aspect, the present application provides a method for handling a failure in a time series database cluster, where the time series database cluster includes: at least one metadata node, a plurality of data nodes; the data nodes are used for storing time sequence data, and the data nodes are copies of each other; the metadata node is used for storing metadata of the time sequence database cluster, and the metadata at least comprises a host name of each data node; the method comprises the following steps:

when any data node is detected to be out of order, a virtual node is newly established;

copying the configuration file of the failed data node to the virtual node, and mounting the data disk of the failed data node to the virtual node;

restarting the sequential database cluster to enable the virtual node to provide data services in place of the failed data node.

In a possible implementation manner, the timing database cluster further includes a liveness detection component, where the liveness detection component is configured to periodically send a liveness detection message to each data node;

the detecting that any one of the data nodes fails comprises:

and for each data node, if the heartbeat message sent by the data node is not received when a set period is reached, determining that the data node fails.

In a second aspect, the present application provides a method for operating a time series database cluster, which is applied to the time series database cluster described in any one of the first aspect, and the method includes:

when a data operation request is received, determining a target data node to be operated from the time sequence database cluster;

acquiring the host name of the target data node from the metadata;

determining a target IP address corresponding to the acquired host name from the host name mapping file;

and executing data operation corresponding to the data operation request on the target data node according to the target IP address and the target time sequence data to be operated.

In a possible implementation manner, the determining a target data node to be operated from the time-series database cluster includes:

determining all the shards contained in the target shardggroup;

and determining the data node bound with the board as a data node to be operated.

In a third aspect, the present application provides a timing database cluster, comprising at least: at least one metadata node, a plurality of data nodes, a heuristics component, and a virtual node creation component;

the data nodes are used for storing time sequence data, and the data nodes are copies of each other;

the metadata node is used for storing metadata of the time sequence database cluster, and the metadata at least comprises a host name of each data node;

the activity detection component is used for detecting the state of the data node, wherein the state comprises normal and fault;

In one possible implementation, the data disk is a cloud disk.

In a fourth aspect, the present application provides a failure handling apparatus for a timing database cluster, the timing database cluster including: at least one metadata node, a plurality of data nodes; the data nodes are used for storing time sequence data, and the data nodes are copies of each other; the metadata node is used for storing metadata of the time sequence database cluster, and the metadata at least comprises a host name of each data node; the device comprises:

the node newly building module is used for newly building a virtual node when detecting that any data node has a fault;

the node configuration module is used for copying the configuration file of the failed data node to the virtual node and mounting the data disk of the failed data node to the virtual node;

and the restarting module is used for restarting the time sequence database cluster so as to enable the virtual node to replace the failed data node to provide data service.

In a fifth aspect, the present application provides an operating apparatus for a timing database cluster, which is applied to the timing database cluster described in any one of the third aspect, the apparatus includes:

the target determining module is used for determining a target data node to be operated from the time sequence database cluster when a data operation request is received;

a first obtaining module, configured to obtain a host name of the target data node from the metadata;

the second acquisition module is used for determining a target IP address corresponding to the acquired host name from the host name mapping file;

In a sixth aspect, the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the method of any one of the first aspect when executing a program stored in the memory.

In a seventh aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any of the first aspects.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:

according to the technical scheme provided by the embodiment of the invention, a plurality of data nodes are arranged in the time sequence database cluster, when any data node is detected to be out of order, a virtual node is newly built, the configuration file of the data node with the failure is copied to the virtual node, and the data disk of the data node with the failure is mounted to the virtual node, so that the newly built virtual node can replace the original data node with the failure to provide data service, the failure migration is realized, and the time sequence database cluster is highly available; furthermore, when the time sequence database cluster is established, the host name is used for configuring the nodes participating in the cluster establishment, and after the fault migration, the host name of the newly-built virtual node follows the host name of the original data node, so that the metadata of the time sequence database cluster does not need to be changed during the fault migration, the fault migration efficiency of the time sequence database cluster can be improved, and the reliability and the stability of the time sequence database cluster can be improved because the metadata of the time sequence database cluster is kept stable.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a block diagram illustrating an architecture of a timing database cluster according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a failure processing method for a time-series database cluster according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a method for operating a temporal database cluster according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a failure handling apparatus of a time-series database cluster according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an operating apparatus of a timing database cluster according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic diagram illustrating an architecture of a timing database cluster according to an embodiment of the present disclosure. The timing database cluster 10 as shown in FIG. 1 includes a metadata node 101, data nodes 102-104, a heuristics component 105, and a virtual node creation component 106.

The metadata node 101 is configured to store metadata of the time-series database cluster, where the metadata may include, but is not limited to: metadata node information, data node information, timing database information, and the like. In the embodiment of the present application, the node information at least includes a host name of the node, and optionally, may further include an IP address, an alias, and the like of the node. This means that, in the embodiment of the present application, when the chronological database cluster 10 illustrated in fig. 1 is established, the nodes participating in the creation of the cluster are configured at least by host names. For example, an example of a data node profile is as follows:

hostname＝"cluster.influxdb.238"

[clusterx]

#cluster node

#joins＝"10.69.58.54:8091，10.69.32.214:8091，10.69.32.51:8091"

joins＝"cluster.influxdb.54:8091，cluster.influxdb.214:8091，cluster.influxdb.51:8091"

in the configuration file, hostname is a host name of the data node, and joins is node information of three other nodes in the timing database cluster 10, except for the data node with the host name "cluster.

Further, the chronological database cluster also has a hostname mapping file. The hostname mapping file is also called a/etc/host file, and is a file responsible for fast resolution of IP addresses and hostnames (or domain names), and is stored in ASCII format in a/etc/directory. The host name mapping file comprises the mapping between the IP address and the host name, and also comprises the alias of the host, and the access to the time sequence database cluster is analyzed by inquiring the host name mapping file to obtain the IP address corresponding to a certain host name, so that the quick and convenient access is realized. For example, an example of mapping a file for a host name is as follows:

10.69.58.54cluster.influxdb.54

10.69.32.214cluster.influxdb.214

10.69.32.51cluster.influxdb.51

10.69.32.238cluster.influxdb.238

the data nodes 102-104 are copies of each other and are used for storing time series data. It can be understood that the time sequence database cluster illustrated in fig. 1 may improve data reliability by storing time sequence data in a form of multiple copies, and further improve stability and reliability of the time sequence database cluster.

Furthermore, the data node provides time sequence data storage service by mounting a data disk. Optionally, the data disk is a cloud disk. It can be understood that by setting the data disk as a cloud disk, the local storage space of the data node can be greatly reduced.

It should be noted that, in practice, the timing database cluster may include at least one metadata node and a plurality of data nodes, and fig. 1 only includes one metadata node and three data nodes as an example. And, the metadata node and the data node may be the same node, that is, the node is used for storing both the time series data and the metadata of the time series database cluster, and fig. 1 only illustrates that the metadata node and the data node are independent. In addition, when the time series database cluster comprises two or more metadata nodes, the two or more metadata nodes are backups of each other, and through the setting, the metadata reliability can be improved, and the stability and the reliability of the time series database cluster are further improved.

The activity detection component 105 is configured to detect a state of a node in the time-series database cluster, in this embodiment, mainly detect a state of a data node, where the state of the node includes normal and failure. How the activity detection component 105 detects the node status is described in detail in the embodiment shown in fig. 2 below, and will not be described in detail here.

A virtual node creating component 106, configured to, when the activity detecting component 105 detects that any data node fails, create a new virtual node, and enable the new virtual node to provide data services in place of the failed original data node. Here, the data services include, but are not limited to: data storage services, data query services, and the like.

As an alternative implementation manner, the virtual node creating component 106 may specifically be a Trove component, and as to how the virtual node creating component 106 enables a newly-built virtual node to provide data services in place of an original data node that fails, detailed description is provided in the embodiment shown in fig. 2 below, and detailed description is omitted here.

Fig. 2 is a flowchart illustrating a method for handling a failure of a time series database cluster according to an embodiment of the present application, where the time series database cluster may be the time series database cluster 10 illustrated in fig. 1. As shown in fig. 2, the method comprises the following steps:

step 201, when detecting that any data node has a fault, newly building a virtual node.

As an embodiment, in the cluster architecture illustrated in fig. 1, the liveness detection component 105 may periodically send a liveness detection message to each data node, and in a normal case, after receiving the liveness detection message, the data node may return a response message to the liveness detection component 105. Based on this, if the activity detection component 105 receives a response message returned by the data node based on the activity detection message within the current timeout time, it may be determined that the data node is normal; on the contrary, if the activity detection component 105 does not receive a response message returned by the data node based on the activity detection message within the current timeout time, it may be determined that the data node is faulty. Here, the current timeout time refers to: the time period taking the time when the activity detection component sends the activity detection message last time as the starting time and the set time length (for example, 1 second) as the time length.

As another embodiment, each data node may send a heartbeat message to the survivor detection component 105 according to a set period, that is, in a normal case, the data node may periodically send a heartbeat message to the survivor detection component 105, and correspondingly, in a normal case, the survivor detection component 105 may receive the heartbeat message sent by each data node every other set period. Based on this, if the heartbeat message sent by the data node is received when the probing component 105 arrives at the set period, it can be determined that the data node is normal; on the contrary, if the heartbeat packet sent by the data node is not received when the activation component 105 arrives at the set period, it may be determined that the data node is faulty.

In addition, it should be noted that node information, such as a host name, an IP address, and the like, of each data node in the time-series database cluster is maintained on the liveness detection component 105, based on which the liveness detection component 105 can send a liveness detection message to each data node, and further determine a state of each data node according to a received response message, or determine a state of each data node according to a received heartbeat message.

In the embodiment of the application, when it is detected that any data node fails, a virtual node can be newly established for the failed data node, so that the newly established virtual node can replace the failed data node to provide data services. It should be noted here that the number of newly created virtual nodes is greater than or equal to the number of failed data nodes, that is, at least one virtual node is newly created for each failed data node.

Step 202, copying the configuration file of the failed data node to the virtual node, and mounting the data disk of the failed data node to the virtual node.

In the embodiment of the application, after a virtual node is newly created for a failed data node, a configuration file of the failed data node is copied to the virtual node, so that the newly created virtual node follows the configuration file of the original data node, and a data disk of the failed data node is mounted to the virtual node. It should be noted here that the new virtual node follows the configuration file of the original data node, that is, the new virtual node follows the host name of the original data node.

For example, in the cluster architecture shown in fig. 1, assuming that the data node 102 fails, the virtual node 102 ' is newly created, the configuration file of the data node 102 is copied to the virtual node 102 ', and the cloud disk mounted by the data node 102 is mounted to the virtual node 102 '. This process is equivalent to migrating the time-series data on the original data node 102 to the newly created virtual node 102'.

And step 203, updating the IP address in the target mapping relation into the IP address of the virtual node in the host name mapping file of the time sequence database cluster, wherein the target mapping relation refers to the mapping relation corresponding to the failed data node.

It can be understood that the IP address of the newly created virtual node is different from the IP address of the failed original data node, and therefore, in order to enable the newly created virtual node to replace the failed original data node, the IP address in the mapping relationship (hereinafter referred to as target correspondence relationship) corresponding to the failed data node is updated to the IP address of the newly created virtual node in the host name mapping file of the time-series database cluster. For example, assuming that a data node with a host name of "cluster.infilfluxdb.238" fails and the IP address of the newly created virtual node for the data node is 10.69.32.235, 10.69.32.238 cluster.infilfluxdb.238 in the above-exemplified host name mapping file is updated to 10.69.32.235 cluster.infilfluxdb.238.

And step 204, restarting the time sequence database cluster to enable the virtual nodes to replace the failed data nodes to provide data services.

In the embodiment of the present application, after the steps 201 to 203 are performed, the timing database cluster is restarted, so that the virtual node can take effect, and the virtual node can replace the failed data node to provide data service.

According to the technical scheme provided by the embodiment of the invention, a plurality of data nodes are arranged in the time sequence database cluster, when any data node is detected to be out of order, a virtual node is newly built, the configuration file of the data node with the failure is copied to the virtual node, and the data disk of the data node with the failure is mounted to the virtual node, so that the newly built virtual node can replace the original data node with the failure to provide data service, the failure migration is realized, and the time sequence database cluster is highly available; furthermore, when the time sequence database cluster is established, the host name is used for configuring the nodes participating in the cluster establishment, and after the fault migration, the host name of the newly-built virtual node follows the host name of the original data node, so that the metadata of the time sequence database cluster does not need to be changed during the fault migration, the fault migration efficiency of the time sequence database cluster can be improved, and the reliability and the stability of the time sequence database cluster can be improved because the metadata of the time sequence database cluster is kept stable.

FIG. 3 is a flowchart illustrating a method for operating a timing database cluster, such as the timing database cluster 10 illustrated in FIG. 1, according to an embodiment of the present invention. As shown in fig. 3, the method comprises the following steps:

step 301, when a data operation request is received, determining a target data node to be operated from a time sequence database cluster.

Step 302, obtaining the host name of the target data node from the metadata.

Step 303, determining a target IP address corresponding to the acquired host name from the host name mapping file.

And step 304, based on the target IP address, executing data operation corresponding to the data operation request on the target time sequence data to be operated on the target data node.

In this embodiment of the application, the data operation corresponding to the data operation request may be a data query operation or a data write operation, and the target time series data to be operated may be time series data to be queried or time series data to be written. Accordingly, the detailed implementation of steps 301 to 304 will be described in detail, in which the data operation corresponding to the data operation request is a data query operation and a data write operation.

First, when the data operation corresponding to the data operation request is a data query operation, detailed implementation of the above steps 301 to 304 will be described:

a sequential database cluster may have a load balancer to which data operation requests sent by clients or other devices to the sequential database cluster first arrive. And after receiving the data operation request, the load balancer sends the data operation request to one of the data nodes. The data node may pull metadata for the temporal database cluster from the metadata node at startup.

As described above, the metadata may include time series database information, and accordingly, when receiving a data operation request, the data node first determines whether a database to be queried exists, and if so, determines a shardggroup to which the data to be queried belongs according to a timestamp of the data to be queried. As shown below, is a shardggroup structure:

Type ShardGroupInfo struct{

ID unit 64// self-increment ID, unique identification shardggroup

Time/start time

Time/end time

Time/time of deletion

Shards [ ] ShardInfo// Shardgroup information of Shaard, e.g. Shaard ID

}

Based on the above-mentioned structural body of shardggroup, after determining the shardggroup to which the data to be queried belongs, all the shard IDs corresponding to the data to be queried can be determined, and then the data node IDs bound by the shard IDs are determined, where the data node corresponding to the data node ID is a data node to be operated (hereinafter referred to as a target data node).

If the target data node is the data node (i.e. the data node receiving the data operation request), the data node may perform a data query operation locally; if the target data node is not the data node, the data node can further acquire the node information of the target data node from the metadata. In the embodiment of the application, the node information at least comprises the host name of the node, so that the data node can acquire the host name of the target data node from the metadata.

Further, the host name mapping file stores the mapping relationship between the host name and the IP address, so that after the data node acquires the host name from the node information of the metadata, the data node can determine the target IP address corresponding to the acquired host name, that is, the IP address of the target data node, from the host name mapping file. And then, sending the data operation request to a target data node corresponding to the target IP address so as to execute data query operation on the target data node.

In addition, after the data node executes data query operation, the queried data is sent to the load balancer, and the load balancer forwards the data to the client. It should be noted that, when the number of the target data nodes is more than 1, the load balancer receives data queried by each target data node, merges the data, and then forwards the merged data to the client.

Next, when the data operation corresponding to the data operation request is a data write operation, detailed implementation of the steps 301 to 304 will be described:

similar to the data query operation, after receiving a data operation request, the load balancer sends the data operation request to one of the data nodes, and when receiving the data operation request, the data node first judges whether a database to be written exists, and if so, calculates a shardggroup to which the data to be written belongs according to a timestamp of the data to be written. Here, two cases are distinguished:

in the first case, when a shardggroup to which data to be written belongs exists, all shards included in the shardggroup can be determined according to a structural body of the shardggroup, and then a data node corresponding to each shard, that is, a target data node to be operated, is determined according to a binding relationship between the shard ID and the data node ID.

If the target data node is not the data node, the data node can further acquire the node information of the target data node from the metadata. In the embodiment of the application, the node information at least comprises the host name of the node, so that the data node can acquire the host name of the target data node from the metadata.

Of course, if the target data node ID is the data node, the data write operation may be directly performed on the data node.

It should be noted that, after the data node performs the data write operation successfully, the data node may return an indication message that the data write operation is successful to the client. And the client side can determine that the data writing is successful after receiving the indication messages sent by the N data nodes, and can determine that the data writing is failed after receiving the indication messages sent by the N data nodes. When it is determined that the data writing fails, the client may send the data operation request to the time-series database cluster again after the interval setting time to request to write the data to the time-series database cluster again. Here, N is equal to or greater than-1, the number of copies to which the data to be written corresponds.

And in the second situation, when the shardgroup to which the data to be written belongs does not exist, a shardgroup is newly established, the starting time of the shardgroup is the timestamp of the data to be written, the ending time of the shardgroup is the starting time plus the set time span, and meanwhile, a shard ID is newly established and the data node needing to be bound is determined. For the subsequent operations, refer to the above description, and are not described herein again.

It should be noted that the number of data nodes to which the shard ID needs to be bound is determined by the number of copies corresponding to the data to be written. For example, if the number of copies of the data to be written is 3, the shard ID needs to bind 3 data nodes.

According to the technical scheme provided by the embodiment of the application, when a data operation request is received, the host name of the target data node to be operated is obtained from the metadata, the target IP address corresponding to the obtained host name is determined from the host name mapping file, and the data operation corresponding to the data operation request is executed on the target data node according to the target IP address and the target time sequence data to be operated, so that the data operation is carried out on the time sequence database cluster under the condition that the nodes participating in cluster creation are configured by using the host name when the time sequence database cluster is established, and the metadata of the time sequence database cluster does not need to be changed during fault migration, so that the metadata of the time sequence database cluster can be kept stable, and the time sequence database cluster is highly available.

Fig. 4 is a schematic diagram of a fault handling apparatus of a time series database cluster according to an embodiment of the present application, where the time series database cluster may be the time series database cluster 10 illustrated in fig. 1. As shown in fig. 4, the apparatus includes:

a node newly building module 41, configured to newly build a virtual node when detecting that any data node fails;

a node configuration module 42, configured to copy a configuration file of the failed data node to the virtual node, and mount a data disk of the failed data node to the virtual node;

a mapping module 43, configured to update an IP address in a target mapping relationship, which refers to a mapping relationship corresponding to the failed data node, to an IP address of the virtual node in a hostname mapping file of the temporal database cluster;

a restarting module 44, configured to restart the sequential database cluster, so that the virtual node provides data service in place of the failed data node.

the fault detection module is specifically configured to:

and for each data node, if the heartbeat message sent by the data node is not received when a set period is reached, determining that the data node fails.

Fig. 5 is a schematic diagram of an operating apparatus of a timing database cluster according to an embodiment of the present application, where the timing database cluster may be the timing database cluster 10 illustrated in fig. 1. As shown in fig. 5, the apparatus includes:

a target determining module 51, configured to determine, when a data operation request is received, a target data node to be operated from the temporal database cluster;

a first obtaining module 52, configured to obtain a host name of the target data node from the metadata;

a second obtaining module 53, configured to determine, from the host name mapping file, a target IP address corresponding to the obtained host name;

and a data operation module 54, configured to perform, on the target data node, a data operation corresponding to the data operation request on the target time series data to be operated based on the target IP address.

In a possible implementation manner, the determining a target data node to be operated by the target determining module 51 from the chronological database cluster includes:

determining all the shards contained in the target shardggroup;

and determining the data node bound with the board as a data node to be operated.

As shown in fig. 6, the embodiment of the present application provides an electronic device, which includes a processor 611, a communication interface 612, a memory 613, and a communication bus 614, wherein the processor 611, the communication interface 612, and the memory 613 communicate with each other through the communication bus 614,

a memory 613 for storing computer programs;

in an embodiment of the present application, when the processor 611 is configured to execute the program stored in the memory 613, the method for processing a failure of a time series database cluster according to the foregoing method embodiment includes:

when any data node is detected to be out of order, a virtual node is newly established;

copying the configuration file of the failed data node to the virtual node, and mounting the data disk of the failed data node to the virtual node;

restarting the sequential database cluster to enable the virtual node to provide data services in place of the failed data node.

Or, the method for implementing the operation of the time series database cluster provided by the foregoing method embodiment includes:

when a data operation request is received, determining a target data node to be operated from the time sequence database cluster;

acquiring the host name of the target data node from the metadata;

determining a target IP address corresponding to the acquired host name from the host name mapping file;

and executing data operation corresponding to the data operation request on the target data node according to the target IP address and the target time sequence data to be operated.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for handling a failure of a time-series database cluster or the method for operating a time-series database cluster, as provided in any of the method embodiments described above.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

16页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种三方接口服务故障智能切换方法及系统

Time sequence database cluster and fault processing and operating method and device thereof

相关技术

网友询问留言