Data processing method and device

文档序号:1937235 发布日期:2021-12-07 浏览:7次 中文

阅读说明:本技术 一种数据处理方法和装置 (Data processing method and device ) 是由 袁博文 彭安 刘珊 孟可 于 2021-01-08 设计创作,主要内容包括:本发明公开了一种数据处理方法和装置,涉及计算机技术领域。该方法的一具体实施方式包括:获取数据库主节点的重做日志文件;对主节点的重做日志文件进行解析处理,以得到日志数据及其对应的标识信息,将日志数据存储至物理事务,并将物理事务写入本地重做日志文件;根据标识信息从本地重做日志文件中获取物理事务对应的物理数据页,并根据物理数据页执行相应的数据处理。该实施方式显著降低了数据库主从节点的数据同步延时状况和数据同步成本、提升了数据同步的可拓展性和数据库的系统性能。(The invention discloses a data processing method and device, and relates to the technical field of computers. One embodiment of the method comprises: acquiring a redo log file of a database main node; analyzing the redo log file of the main node to obtain log data and corresponding identification information, storing the log data into a physical transaction, and writing the physical transaction into a local redo log file; and acquiring a physical data page corresponding to the physical transaction from the local redo log file according to the identification information, and executing corresponding data processing according to the physical data page. The implementation mode obviously reduces the data synchronization delay condition and the data synchronization cost of the master node and the slave node of the database, and improves the expansibility of data synchronization and the system performance of the database.)

1. A data processing method, comprising:

acquiring a redo log file of a database main node;

analyzing the redo log file of the main node to obtain log data and corresponding identification information, storing the log data to a physical transaction, and writing the physical transaction into a local redo log file;

and acquiring a physical data page corresponding to the physical transaction from the local redo log file according to the identification information, and executing corresponding data processing according to the physical data page.

2. The data processing method of claim 1, wherein the obtaining the redo log file of the database master node further comprises:

determining an initial consistent position point corresponding to a master node and a slave node of the database, obtaining a redo log file after a log sequence number corresponding to the initial consistent position point, and storing the redo log file into a cache corresponding to the slave node.

3. The data processing method of claim 2, wherein the parsing the redo log file of the master node to obtain log data and corresponding identification information thereof, storing the log data to a physical transaction, and writing the physical transaction into a local redo log file further comprises:

traversing and acquiring the redo log file in the cache, performing classification analysis processing on the redo log file according to the log type, extracting identification information and log data, storing the log data into a physical transaction, and writing the physical transaction into a local redo log file; wherein the identification information includes a tablespace number and a data page number.

4. The data processing method of claim 3, wherein storing the log data to a physical transaction and writing the physical transaction to a local redo log file further comprises:

storing the log data into a physical transaction, and determining the physical transaction corresponding to a single log type and the physical transaction corresponding to multiple log types;

distributing log serial numbers for the physical transactions corresponding to the single log type according to the fixed log data length, and distributing log serial numbers for the physical transactions corresponding to the multiple log types according to the actual log data length;

and writing the log data into a local redo log file according to the log sequence number corresponding to the physical transaction.

5. The data processing method according to claim 1, wherein the obtaining, according to the identification information, the physical data page corresponding to the physical transaction from the local redo log file further comprises:

acquiring a physical data page corresponding to the physical transaction from the local redo log file according to the corresponding relation between the identification information and the physical transaction in the hash table; wherein the hash table is created after the step of storing the log data to a physical transaction.

6. The data processing method of claim 1, wherein in the event that the local redo log file generates a write operation, the method further comprises:

intercepting the write operation, and determining an operation source of the write operation;

the write status is corrected according to the operation source.

7. The data processing method of claim 6, further comprising:

and correcting the transaction state of the slave node corresponding to the revocation log by adopting at least one of the following methods: the slave node transaction is prohibited from regenerating and cleaning the rollback queue.

8. A data processing apparatus, comprising:

the file acquisition module is used for acquiring a redo log file of the database main node;

the analysis module is used for analyzing the redo log file of the main node to obtain log data and corresponding identification information thereof, storing the log data into a physical transaction and writing the physical transaction into a local redo log file;

and the data processing module is used for acquiring a physical data page corresponding to the physical transaction from the local redo log file according to the identification information and executing corresponding data processing according to the physical data page.

9. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus.

Background

Master-slave modular of relational databases (e.g. MySQL): the master node corresponds to one or more slave nodes, the master node is responsible for writing, the slave nodes are responsible for reading, the single-point load bottleneck can be solved, and even if the master node is locked and crashed, the normal operation of the service can be ensured by reading the slave nodes. To implement the above situation, it is necessary to ensure data synchronization between the master node and the slave node of the database.

In the existing method, a binary log (binary log, referred to as binlog for short) is mainly used, a master node server sends a generated binlog file to a slave node server after a transaction is submitted, the slave node server receives the binlog and then applies a logical log generated by a master node to a database system of the master node server, so that a master-slave data synchronization function is realized), and a redolog (redo log technology is used for recording physical changes of an actual data file (how data at what position of the data file is changed), and only after the redolog is stored in a disk of the slave node, the actual physical data page is modified and stored in the disk, so that the master-slave data synchronization is realized).

The prior art has at least the following problems:

the existing business processing method has the technical problems of long data synchronization delay of a master node and a slave node, high data synchronization cost, poor expansibility and influence on the system performance of a database.

Disclosure of Invention

In view of this, embodiments of the present invention provide a data processing method and apparatus, which can significantly reduce the data synchronization delay condition of a master node and a slave node of a database, reduce the data synchronization cost, improve the expansibility of data synchronization, and improve the system performance of the database.

To achieve the above object, according to a first aspect of embodiments of the present invention, there is provided a data processing method including:

acquiring a redo log file of a database main node;

analyzing the redo log file of the main node to obtain log data and corresponding identification information, storing the log data into a physical transaction, and writing the physical transaction into a local redo log file;

and acquiring a physical data page corresponding to the physical transaction from the local redo log file according to the identification information, and executing corresponding data processing according to the physical data page.

Further, acquiring the redo log file of the database master node further includes:

determining an initial consistent position point corresponding to a master node and a slave node of a database, obtaining a redo log file after a log sequence number corresponding to the initial consistent position point, and storing the redo log file into a cache corresponding to the slave node.

Further, parsing the redo log file of the master node to obtain log data and identification information corresponding to the log data, storing the log data to a physical transaction, and writing the physical transaction into the local redo log file, further includes:

traversing and acquiring the redo log file in the cache, performing classification analysis processing on the redo log file according to the log type, extracting identification information and log data, storing the log data into a physical transaction, and writing the physical transaction into a local redo log file; wherein the identification information includes a tablespace number and a data page number.

Further, storing the log data to a physical transaction, and writing the physical transaction to a local redo log file, further includes:

storing the log data into a physical transaction, and determining the physical transaction corresponding to a single log type and the physical transaction corresponding to a plurality of log types;

distributing log serial numbers for physical transactions corresponding to a single log type according to the fixed log data length, and distributing log serial numbers for physical transactions corresponding to multiple log types according to the actual log data length;

and writing the log data into a local redo log file according to the log sequence number corresponding to the physical transaction.

Further, acquiring a physical data page corresponding to the physical transaction from the local redo log file according to the identification information, further comprising:

acquiring a physical data page corresponding to the physical transaction from the local redo log file according to the corresponding relation between the identification information and the physical transaction in the hash table; wherein the hash table is created after the step of storing the log data to the physical transaction.

Further, in a case where the local redo log file generates a write operation, the method further includes:

intercepting a write operation, and determining an operation source of the write operation;

the write status is corrected according to the operation source.

Further, still include:

and correcting the transaction state of the slave node corresponding to the revocation log by adopting at least one of the following methods: the slave node transaction is prohibited from regenerating and cleaning the rollback queue.

According to a second aspect of the embodiments of the present invention, there is provided a data processing apparatus including:

the file acquisition module is used for acquiring a redo log file of the database main node;

the analysis module is used for analyzing the redo log file of the main node to obtain log data and corresponding identification information thereof, storing the log data into a physical transaction and writing the physical transaction into a local redo log file;

and the data processing module is used for acquiring a physical data page corresponding to the physical transaction from the local redo log file according to the identification information and executing corresponding data processing according to the physical data page.

According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including:

one or more processors;

a storage device for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement any of the data processing methods described above.

According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program, when executed by a processor, implementing any one of the data processing methods described above.

One embodiment of the above invention has the following advantages or benefits: because the redo log file of the database main node is obtained; analyzing the redo log file of the main node to obtain log data and corresponding identification information, storing the log data into a physical transaction, and writing the physical transaction into a local redo log file; the method comprises the steps of obtaining a physical data page corresponding to a physical transaction from a local redo log file according to identification information, and executing corresponding data processing according to the physical data page, so that the technical problems of long data synchronization delay of a master node and a slave node, high data synchronization cost, poor expansibility and influence on the system performance of a database in the existing business processing method are solved, and the technical effects of remarkably reducing the data synchronization delay condition of the master node and the slave node of the database, reducing the data synchronization cost, improving the expansibility of data synchronization and improving the system performance of the database are achieved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

fig. 1 is a schematic diagram of a main flow of a data processing method provided according to a first embodiment of the present invention;

FIG. 2a is a schematic diagram of a main flow of a data processing method according to a second embodiment of the present invention;

FIG. 2b is a schematic diagram of a corresponding framework of the method of FIG. 2 a;

FIG. 3 is a schematic diagram of the main blocks of a data processing apparatus according to an embodiment of the present invention;

FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of a main flow of a data processing method provided according to a first embodiment of the present invention; as shown in fig. 1, the data processing method provided in the embodiment of the present invention mainly includes:

step S101, a redo log file of a database master node is obtained.

The redo log file (redolog) is used for recording physical changes of the actual data file (where data of the data file is changed), and only after the redolog is stored in the disk, the actual physical data page is modified and stored in the disk. By the aid of the design, MySQL data writing is very safe, and even if half of database writing is delayed, the cache Reco vey (Crash recovery) of local MySQL can be carried out by using redolog stored in a disk.

Further, according to an embodiment of the present invention, the obtaining of the redo log file of the database master node further includes:

determining an initial consistent position point corresponding to a master node and a slave node of a database, obtaining a redo log file after a log sequence number corresponding to the initial consistent position point, and storing the redo log file into a cache corresponding to the slave node.

Because neither the inconsistent database nor the inconsistent locus of the consistent database can be used to redo the data in the log file, maintaining the initial consistent locus is the basis for implementing the present application (according to a specific implementation manner of an embodiment of the present invention, the initial consistent locus is implemented by the mysql8.0 cloning function).

Specifically, according to the embodiment of the present invention, by recording the LSN (Log sequence Number, which is used to indicate the position of a specific Log file recorded in a Log file) of a consistent location, taking the LSN as an initial LSN, a slave node of a database initiates a replication request to a master node to obtain a redolog Log from the initial LSN, and records an end LSN of the redolog, and stores the redo Log file in a cache corresponding to the slave node. Through the setting, the redolog of the main node database can be successfully redone and applied to the slave node database to form database data through subsequent operation in a redolog physical copying mode, and the problems of system performance reduction and master-slave delay caused by the adoption of binlog logical copying in the existing method are solved.

Preferably, according to the embodiment of the present invention, the redo log file after the initial consistency point is obtained through the communication protocol and the communication request agreed by the database slave node and the master node.

Step S102, analyzing the redo log file of the main node to obtain log data and corresponding identification information, storing the log data to a physical transaction, and writing the physical transaction into the local redo log file.

Specifically, according to the embodiment of the present invention, the parsing process for the redo log file in the cache follows the design rule of the storage engine inbb, that is, taking mtr (Mini-transaction, also called physical transaction) as an operation unit and Mini-log (MLOG, Mini-log) as a parsing unit, parses the redo log file of the master node, stores the log data into the physical transaction, and writes the physical transaction into the local redo log file.

Further, according to the embodiment of the present invention, the analyzing the redo log file of the master node to obtain log data and identification information corresponding to the log data, storing the log data in a physical transaction, and writing the physical transaction in the local redo log file further includes:

traversing and acquiring the redo log file in the cache, performing classification analysis processing on the redo log file according to the log type, extracting identification information and log data, storing the log data into a physical transaction, and writing the physical transaction into a local redo log file; the identification information includes a table space number and a data page number.

After the step of storing the redo log file into the cache corresponding to the slave node, a resolving thread (log parser) is activated to implement resolving processing of the redo log by using the resolving thread. According to a specific implementation manner of the embodiment of the invention, the redo log file in the cache is continuously traversed, and if no data exists (namely, the redo log file does not exist in the cache), the redo log file is dormant and waits; if the duplicate log file exists, the duplicate log file is obtained, the duplicate log file is subjected to classification analysis processing according to the log type, a space _ id (table space _ id) and a page _ no (namely identification information) are extracted, the log data are stored into a physical transaction, the physical transaction is written into a local duplicate log file, the duplicate log file of the master node is analyzed and constructed into a local available physical transaction through the setting, a foundation is made for the subsequent slave node to apply the data of the physical transaction, and meanwhile, the data synchronization delay condition of the master node and the slave node of the database is obviously reduced, and the data synchronization cost is reduced.

Preferably, according to an embodiment of the present invention, the storing the log data into the physical transaction, and writing the physical transaction into the local redo log file further includes:

storing the log data into a physical transaction, and determining the physical transaction corresponding to a single log type and the physical transaction corresponding to a plurality of log types;

distributing log serial numbers for physical transactions corresponding to a single log type according to the fixed log data length, and distributing log serial numbers for physical transactions corresponding to multiple log types according to the actual log data length;

and writing the log data into a local redo log file according to the log sequence number corresponding to the physical transaction.

According to the embodiment of the invention, the redo log file is divided into a single log type and a plurality of log types to be stored in the physical transaction after being analyzed by the analyzing unit by using the mini log, and the log sequence number corresponding to the log data length is directly allocated to the physical transaction of the single log type; and the multi-log type physical transaction increases the length to be distributed progressively according to the length of each analysis unit in the analysis process, and after all the mini units are analyzed, corresponding log serial numbers are distributed according to the real-time log data length. Through the arrangement, the log sequence number pre-allocation technology is combined with the log system design corresponding to the physical transaction of the storage engine, and the log sequence number is pre-allocated to the physical transaction to be applied (which refers to the physical data page in the application physical transaction), so that the continuity and the orderliness of the subsequent application physical transaction are ensured.

Step S103, acquiring a physical data page corresponding to the physical transaction from the local redo log file according to the identification information, and executing corresponding data processing according to the physical data page.

The redo log is typically a physical log that records physical modifications to a data page, rather than what a line or lines are modified, and is used to restore a committed physical data page (restore a data page and only restore to the last committed location).

Specifically, according to an embodiment of the present invention, the acquiring a physical data page corresponding to a physical transaction from a local redo log file according to identification information further includes:

acquiring a physical data page corresponding to the physical transaction from the local redo log file according to the corresponding relation between the identification information and the physical transaction in the hash table; wherein the hash table is created after the step of storing the log data to the physical transaction.

After the step of writing the log data into the local redo log file according to the embodiment of the present invention, the application thread is activated to apply the physical transaction to the buffer pool by using the application thread. Specifically, the parsed physical transaction is stored in the hash table according to the table space number and the data page number. Therefore, the physical data page can be obtained from the cache according to the tablespace number and the data page number, if the physical data page exists, the physical data page is directly applied, and if the physical data page does not exist, the physical data page is read from the disk. Through the arrangement, the expansibility of data synchronization and the system performance of the database are improved.

According to the embodiment of the invention, because the hash table is in a non-sequential manner and the application of the redo log file is in an ordered manner, the hash table has the advantages that data application can be carried out according to the table space number and the data page number, the updating frequency of the physical data page in the buffer pool is reduced, and the reading times of a disk are reduced. Meanwhile, the orderliness of the redo log file to the application is replaced by using a linear application technology, namely a linear storage structure is used for storing the physical transaction instead of a hash table structure.

Further, according to an embodiment of the present invention, in a case where the local redo log file generates a write operation, the method further includes:

intercepting a write operation, and determining an operation source of the write operation;

the write status is corrected according to the operation source.

For any MySQL instance, the metadata information of the own database needs to be stored. From MySQL8.0, MySQL metadata is completely stored by the Innodb storage engine, so all metadata modifications are written into redolog. Any local redolog is freely written in, the change of a local physical data page and the change of a physical data page recorded by the redolog of the main node are inconsistent, the inconsistency is light to cause data application failure, and the database is unavailable due to direct damage of a physical file. Therefore, through the setting, the consistency of the node data is ensured by means of dual guarantee of external truncation and internal verification, so that the applicability of the local redo log file of the slave node is ensured, and the expansibility of data synchronization is improved.

Preferably, according to an embodiment of the present invention, the method further includes:

and correcting the transaction state of the slave node corresponding to the revocation log by adopting at least one of the following methods: the slave node transaction is prohibited from regenerating and cleaning the rollback queue.

The revocation list space (undospace) is called as the revocation log (undolog), which is an important means for the Innodb to guarantee the data transaction. The undo log and the redolog are matched with each other, and the consistency of database transactions is ensured when the transactions are submitted, rolled back and crashed and recovered. However, undolog can also generate redolog in the execution process, and if the execution process of the redolog data application occurs from a node, the result of free writing is necessarily generated. Through the arrangement, besides the means of external truncation and internal verification, the state of the space of the revocation list is corrected by adopting multiple means of preventing the transaction from being regenerated from the node and cleaning the rollback queue. According to an embodiment of the present invention, after the slave node is restarted, the incomplete transaction is regenerated, and then the undo log is executed, and the regeneration is prohibited, and the contents executed by the undo (undo) operation can be obtained by the application of the redolog, and the same principle is adopted for cleaning the rollback queue. Through the setting, the consistency of the state of the slave node and the state of the table space is ensured, the consistency of the log sequence number is further ensured, and the system performance of the database is obviously improved.

According to the technical scheme of the embodiment of the invention, the redo log file of the database main node is obtained; analyzing the redo log file of the main node to obtain log data and corresponding identification information, storing the log data into a physical transaction, and writing the physical transaction into a local redo log file; the method comprises the steps of obtaining a physical data page corresponding to a physical transaction from a local redo log file according to identification information, and executing corresponding data processing according to the physical data page, so that the technical problems of long data synchronization delay of a master node and a slave node, high data synchronization cost, poor expansibility and influence on the system performance of a database in the existing business processing method are solved, and the technical effects of remarkably reducing the data synchronization delay condition of the master node and the slave node of the database, reducing the data synchronization cost, improving the expansibility of data synchronization and improving the system performance of the database are achieved.

FIG. 2a is a schematic diagram of a main flow of a data processing method according to a second embodiment of the present invention; as shown in fig. 2a, the data processing method provided by the present invention mainly includes:

step S201, determining an initial consistent position point corresponding to a master node and a slave node of a database, obtaining a redo log file after a log sequence number corresponding to the initial consistent position point, and storing the redo log file into a cache corresponding to the slave node.

Because neither the inconsistent database nor the inconsistent locus of the consistent database can be applied to redo the data in the log file, maintaining the initial consistent locus is based on implementing the present application and is implemented by the mysql8.0 cloning function according to a specific implementation manner of an embodiment of the present invention.

Specifically, according to the embodiment of the present invention, by recording the LSN (Log sequence Number, which is used to indicate the position of a specific Log file recorded in a Log file) of a consistent location, taking the LSN as an initial LSN, a slave node of a database initiates a replication request to a master node to obtain a redolog Log from the initial LSN, and records an end LSN of the redolog, and stores the redo Log file in a cache corresponding to the slave node. Through the setting, the redolog of the main node database can be successfully redone and applied to the slave node database to form database data through subsequent operation in a redolog physical copying mode, and the problems of system performance reduction and master-slave delay caused by the adoption of binlog logical copying in the existing method are solved.

Preferably, according to the embodiment of the present invention, the redo log file after the initial consistency point is obtained through the communication protocol and the communication request agreed by the database slave node and the master node.

Step S202, traverse and obtain the redo log file in the cache, perform classification and analysis processing on the redo log file according to the log type, extract the identification information and the log data,

specifically, according to the embodiment of the present invention, the parsing process for the redo log file in the cache follows the design rule of the storage engine inbb, that is, taking mtr (Mini-transaction, also called physical transaction) as an operation unit and Mini-log (MLOG, Mini-log) as a parsing unit, parses the redo log file of the master node, stores the log data into the physical transaction, and writes the physical transaction into the local redo log file.

After the step of storing the redo log file into the cache corresponding to the slave node, a resolving thread (log parser) is activated to implement resolving processing of the redo log by using the resolving thread. According to a specific implementation manner of the embodiment of the invention, the redo log file in the cache is continuously traversed, and if no data exists (namely, the redo log file does not exist in the cache), the redo log file is dormant and waits; if the duplicate log file exists, the duplicate log file is obtained, the duplicate log file is subjected to classification analysis processing according to the log type, a space _ id (table space _ id) and a page _ no (namely identification information) are extracted, the log data are stored into a physical transaction, the physical transaction is written into a local duplicate log file, the duplicate log file of the master node is analyzed and constructed into a local available physical transaction through the setting, a foundation is made for the subsequent slave node to apply the data of the physical transaction, and meanwhile, the data synchronization delay condition of the master node and the slave node of the database is obviously reduced, and the data synchronization cost is reduced.

Step S203, storing the log data into the physical transaction, and determining the physical transaction corresponding to the single log type and the physical transaction corresponding to the multiple log types.

Step S204, distributing log serial numbers for the physical transactions corresponding to the single log type according to the fixed log data length, and distributing log serial numbers for the physical transactions corresponding to the multiple log types according to the actual log data length.

According to the embodiment of the invention, the redo log file is divided into a single log type and a plurality of log types to be stored in the physical transaction after being analyzed by the analyzing unit by using the mini log, and the log sequence number corresponding to the log data length is directly allocated to the physical transaction of the single log type; and the multi-log type physical transaction increases the length to be distributed progressively according to the length of each analysis unit in the analysis process, and after all the mini units are analyzed, corresponding log serial numbers are distributed according to the real-time log data length.

And step S205, writing the log data into the local redo log file according to the log sequence number corresponding to the physical transaction.

Through the arrangement, the log sequence number pre-allocation technology is combined with the log system design corresponding to the physical transaction of the storage engine, and the log sequence number is pre-allocated to the physical transaction to be applied (which refers to the physical data page in the application physical transaction), so that the continuity and the orderliness of the subsequent application physical transaction are ensured.

Step S206, according to the corresponding relation between the identification information in the hash table and the physical transaction, obtaining a physical data page corresponding to the physical transaction from the local redo log file.

The redo log is typically a physical log that records physical modifications to a data page, rather than what a line or lines are modified, and is used to restore a committed physical data page (restore a data page and only restore to the last committed location).

After the step of writing the log data into the local redo log file according to the embodiment of the present invention, the application thread is activated to apply the physical transaction to the buffer pool by using the application thread. Specifically, the parsed physical transaction is stored in the hash table according to the table space number and the data page number. Therefore, the physical data page can be obtained from the cache according to the tablespace number and the data page number, if the physical data page exists, the physical data page is directly applied, and if the physical data page does not exist, the physical data page is read from the disk. Through the arrangement, the expansibility of data synchronization and the system performance of the database are improved.

Step S207, corresponding data processing is performed according to the physical data page.

Specifically, as shown in fig. 2b, the Master node is a database Master node, the Slave node is a Slave node, log parser is an analysis thread, and log applet is an application thread.

According to the embodiment of the invention, because the hash table is in a non-sequential manner and the application of the redo log file is in an ordered manner, the hash table has the advantages that data application can be carried out according to the table space number and the data page number, the updating frequency of the physical data page in the buffer pool is reduced, and the reading times of a disk are reduced. Meanwhile, the orderliness of the redo log file to the application is replaced by using a linear application technology, namely a linear storage structure is used for storing the physical transaction instead of a hash table structure.

Further, according to an embodiment of the present invention, in a case where the local redo log file generates a write operation, the method further includes:

intercepting a write operation, and determining an operation source of the write operation;

the write status is corrected according to the operation source.

For any MySQL instance, the metadata information of the own database needs to be stored. From MySQL8.0, MySQL metadata is completely stored by the Innodb storage engine, so all metadata modifications are written into redolog. Any local redolog is freely written in, the change of a local physical data page and the change of a physical data page recorded by the redolog of the main node are inconsistent, the inconsistency is light to cause data application failure, and the database is unavailable due to direct damage of a physical file. Therefore, through the setting, the consistency of the node data is ensured by means of dual guarantee of external truncation and internal verification, so that the applicability of the local redo log file of the slave node is ensured, and the expansibility of data synchronization is improved.

Preferably, according to an embodiment of the present invention, the method further includes:

and correcting the transaction state of the slave node corresponding to the revocation log by adopting at least one of the following methods: the slave node transaction is prohibited from regenerating and cleaning the rollback queue.

The revocation list space (undospace) is called as the revocation log (undolog), which is an important means for the Innodb to guarantee the data transaction. The undo log and the redolog are matched with each other, and the consistency of database transactions is ensured when the transactions are submitted, rolled back and crashed and recovered. However, undolog can also generate redolog in the execution process, and if the execution process of the redolog data application occurs from a node, the result of free writing is necessarily generated. Through the arrangement, besides the means of external truncation and internal verification, the state of the space of the revocation list is corrected by adopting multiple means of preventing the transaction from being regenerated from the node and cleaning the rollback queue. According to an embodiment of the present invention, after the slave node is restarted, the incomplete transaction is regenerated, and then the undo log is executed, and the regeneration is prohibited, and the contents executed by the undo (undo) operation can be obtained by the application of the redolog, and the same principle is adopted for cleaning the rollback queue. Through the setting, the consistency of the state of the slave node and the state of the table space is ensured, the consistency of the log sequence number is further ensured, and the system performance of the database is obviously improved.

According to the technical scheme of the embodiment of the invention, the redo log file of the database main node is obtained; analyzing the redo log file of the main node to obtain log data and corresponding identification information, storing the log data into a physical transaction, and writing the physical transaction into a local redo log file; the method comprises the steps of obtaining a physical data page corresponding to a physical transaction from a local redo log file according to identification information, and executing corresponding data processing according to the physical data page, so that the technical problems of long data synchronization delay of a master node and a slave node, high data synchronization cost, poor expansibility and influence on the system performance of a database in the existing business processing method are solved, and the technical effects of remarkably reducing the data synchronization delay condition of the master node and the slave node of the database, reducing the data synchronization cost, improving the expansibility of data synchronization and improving the system performance of the database are achieved.

FIG. 3 is a schematic diagram of the main blocks of a data processing apparatus according to an embodiment of the present invention; as shown in fig. 3, a data processing apparatus 300 according to an embodiment of the present invention mainly includes:

the file obtaining module 301 is configured to obtain a redo log file of the database master node.

The redo log file (redolog) is used for recording physical changes of the actual data file (where data of the data file is changed), and only after the redolog is stored in the disk, the actual physical data page is modified and stored in the disk. By the aid of the design, MySQL data writing is very safe, and even if half of database writing is delayed, the cache Reco vey (Crash recovery) of local MySQL can be carried out by using redolog stored in a disk.

Further, according to the embodiment of the present invention, the file obtaining module 301 is further configured to:

determining an initial consistent position point corresponding to a master node and a slave node of a database, obtaining a redo log file after a log sequence number corresponding to the initial consistent position point, and storing the redo log file into a cache corresponding to the slave node.

Because neither the inconsistent database nor the inconsistent locus of the consistent database can be used to redo the data in the log file, maintaining the initial consistent locus is the basis for implementing the present application (according to a specific implementation manner of an embodiment of the present invention, the initial consistent locus is implemented by the mysql8.0 cloning function).

Specifically, according to the embodiment of the present invention, by recording the LSN (Log sequence Number, which is used to indicate the position of a specific Log file recorded in a Log file) of a consistent location, taking the LSN as an initial LSN, a slave node of a database initiates a replication request to a master node to obtain a redolog Log from the initial LSN, and records an end LSN of the redolog, and stores the redo Log file in a cache corresponding to the slave node. Through the setting, the redolog of the main node database can be successfully redone and applied to the slave node database to form database data through subsequent operation in a redolog physical copying mode, and the problems of system performance reduction and master-slave delay caused by the adoption of binlog logical copying in the existing method are solved.

Preferably, according to the embodiment of the present invention, the redo log file after the initial consistency point is obtained through the communication protocol and the communication request agreed by the database slave node and the master node.

The parsing module 302 is configured to parse the redo log file of the master node to obtain log data and identification information corresponding to the log data, store the log data in a physical transaction, and write the physical transaction into the local redo log file.

Specifically, according to the embodiment of the present invention, the parsing process for the redo log file in the cache follows the design rule of the storage engine inbb, that is, taking mtr (Mini-transaction, also called physical transaction) as an operation unit and Mini-log (MLOG, Mini-log) as a parsing unit, parses the redo log file of the master node, stores the log data into the physical transaction, and writes the physical transaction into the local redo log file.

Further, according to an embodiment of the present invention, the parsing module 302 is further configured to:

traversing and acquiring the redo log file in the cache, performing classification analysis processing on the redo log file according to the log type, extracting identification information and log data, storing the log data into a physical transaction, and writing the physical transaction into a local redo log file; the identification information includes a table space number and a data page number.

After the step of storing the redo log file into the cache corresponding to the slave node, a resolving thread (log parser) is activated to implement resolving processing of the redo log by using the resolving thread. According to a specific implementation manner of the embodiment of the invention, the redo log file in the cache is continuously traversed, and if no data exists (namely, the redo log file does not exist in the cache), the redo log file is dormant and waits; if the duplicate log file exists, the duplicate log file is obtained, the duplicate log file is subjected to classification analysis processing according to the log type, a space _ id (table space _ id) and a page _ no (namely identification information) are extracted, the log data are stored into a physical transaction, the physical transaction is written into a local duplicate log file, the duplicate log file of the master node is analyzed and constructed into a local available physical transaction through the setting, a foundation is made for the subsequent slave node to apply the data of the physical transaction, and meanwhile, the data synchronization delay condition of the master node and the slave node of the database is obviously reduced, and the data synchronization cost is reduced.

Preferably, according to an embodiment of the present invention, the parsing module 302 is further configured to:

storing the log data into a physical transaction, and determining the physical transaction corresponding to a single log type and the physical transaction corresponding to a plurality of log types;

distributing log serial numbers for physical transactions corresponding to a single log type according to the fixed log data length, and distributing log serial numbers for physical transactions corresponding to multiple log types according to the actual log data length;

and writing the log data into a local redo log file according to the log sequence number corresponding to the physical transaction.

According to the embodiment of the invention, the redo log file is divided into a single log type and a plurality of log types to be stored in the physical transaction after being analyzed by the analyzing unit by using the mini log, and the log sequence number corresponding to the log data length is directly allocated to the physical transaction of the single log type; and the multi-log type physical transaction increases the length to be distributed progressively according to the length of each analysis unit in the analysis process, and after all the mini units are analyzed, corresponding log serial numbers are distributed according to the real-time log data length. Through the arrangement, the log sequence number pre-allocation technology is combined with the log system design corresponding to the physical transaction of the storage engine, and the log sequence number is pre-allocated to the physical transaction to be applied (which refers to the physical data page in the application physical transaction), so that the continuity and the orderliness of the subsequent application physical transaction are ensured.

And the data processing module 303 is configured to obtain a physical data page corresponding to the physical transaction from the local redo log file according to the identification information, and execute corresponding data processing according to the physical data page.

Specifically, according to an embodiment of the present invention, the data processing module 303 is further configured to:

acquiring a physical data page corresponding to the physical transaction from the local redo log file according to the corresponding relation between the identification information and the physical transaction in the hash table; wherein the hash table is created after the step of storing the log data to the physical transaction.

After the step of writing the log data into the local redo log file according to the embodiment of the present invention, the application thread is activated to apply the physical transaction to the buffer pool by using the application thread. Specifically, the parsed physical transaction is stored in the hash table according to the table space number and the data page number. Therefore, the physical data page can be obtained from the cache according to the tablespace number and the data page number, if the physical data page exists, the physical data page is directly applied, and if the physical data page does not exist, the physical data page is read from the disk. Through the arrangement, the expansibility of data synchronization and the system performance of the database are improved.

According to the embodiment of the invention, because the hash table is in a non-sequential manner and the application of the redo log file is in an ordered manner, the hash table has the advantages that data application can be carried out according to the table space number and the data page number, the updating frequency of the physical data page in the buffer pool is reduced, and the reading times of a disk are reduced. Meanwhile, the orderliness of the redo log file to the application is replaced by using a linear application technology, namely a linear storage structure is used for storing the physical transaction instead of a hash table structure.

Further, according to an embodiment of the present invention, the data processing apparatus 300 further includes a writing status correction module, where in the case that the local redo log file generates a writing operation, the writing status correction module is configured to:

intercepting a write operation, and determining an operation source of the write operation;

the write status is corrected according to the operation source.

For any MySQL instance, the metadata information of the own database needs to be stored. From MySQL8.0, MySQL metadata is completely stored by the Innodb storage engine, so all metadata modifications are written into redolog. Any local redolog is freely written in, the change of a local physical data page and the change of a physical data page recorded by the redolog of the main node are inconsistent, the inconsistency is light to cause data application failure, and the database is unavailable due to direct damage of a physical file. Therefore, through the setting, the consistency of the node data is ensured by means of dual guarantee of external truncation and internal verification, so that the applicability of the local redo log file of the slave node is ensured, and the expansibility of data synchronization is improved.

Preferably, according to the embodiment of the present invention, the data processing apparatus 300 further includes a slave node transaction status performing correction module, configured to:

and correcting the transaction state of the slave node corresponding to the revocation log by adopting at least one of the following methods: the slave node transaction is prohibited from regenerating and cleaning the rollback queue.

The revocation list space (undospace) is called as the revocation log (undolog), which is an important means for the Innodb to guarantee the data transaction. The undo log and the redolog are matched with each other, and the consistency of database transactions is ensured when the transactions are submitted, rolled back and crashed and recovered. However, undolog can also generate redolog in the execution process, and if the execution process of the redolog data application occurs from a node, the result of free writing is necessarily generated. Through the arrangement, besides the means of external truncation and internal verification, the state of the space of the revocation list is corrected by adopting multiple means of preventing the transaction from being regenerated from the node and cleaning the rollback queue. According to an embodiment of the present invention, after the slave node is restarted, the incomplete transaction is regenerated, and then the undo log is executed, and the regeneration is prohibited, and the contents executed by the undo (undo) operation can be obtained by the application of the redolog, and the same principle is adopted for cleaning the rollback queue. Through the setting, the consistency of the state of the slave node and the state of the table space is ensured, the consistency of the log sequence number is further ensured, and the system performance of the database is obviously improved.

According to the technical scheme of the embodiment of the invention, the redo log file of the database main node is obtained; analyzing the redo log file of the main node to obtain log data and corresponding identification information, storing the log data into a physical transaction, and writing the physical transaction into a local redo log file; the method comprises the steps of obtaining a physical data page corresponding to a physical transaction from a local redo log file according to identification information, and executing corresponding data processing according to the physical data page, so that the technical problems of long data synchronization delay of a master node and a slave node, high data synchronization cost, poor expansibility and influence on the system performance of a database in the existing business processing method are solved, and the technical effects of remarkably reducing the data synchronization delay condition of the master node and the slave node of the database, reducing the data synchronization cost, improving the expansibility of data synchronization and improving the system performance of the database are achieved.

Fig. 4 shows an exemplary system architecture 400 of a data processing method or data processing apparatus to which embodiments of the present invention may be applied.

As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405 (this architecture is merely an example, and the components included in a particular architecture may be adapted according to application specific circumstances). The network 404 serves as a medium for providing communication links between the terminal devices 401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use terminal devices 401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The terminal devices 401, 402, 403 may have installed thereon various communication client applications, such as a data processing application, a data synchronization application, a search application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only).

The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 405 may be a server that provides various services, such as a server (for example only) for users to (perform data processing with) the terminal devices 401, 402, 403. The server may analyze and perform other processing on the received data such as the redo log file, and feed back a processing result (for example, the log data and the corresponding identification information — only an example) to the terminal device.

It should be noted that the data processing method provided by the embodiment of the present invention is generally executed by the server 405, and accordingly, the data processing apparatus is generally disposed in the server 405.

It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use with a terminal device or server implementing an embodiment of the invention is shown. The terminal device or the server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises a file acquisition module, an analysis module and a data processing module. The names of these modules do not constitute a limitation to the modules themselves in some cases, and for example, the file acquisition module may also be described as a "module for acquiring redo log files of a database master node".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring a redo log file of a database main node; analyzing the redo log file of the main node to obtain log data and corresponding identification information, storing the log data into a physical transaction, and writing the physical transaction into a local redo log file; and acquiring a physical data page corresponding to the physical transaction from the local redo log file according to the identification information, and executing corresponding data processing according to the physical data page.

According to the technical scheme of the embodiment of the invention, the redo log file of the database main node is obtained; analyzing the redo log file of the main node to obtain log data and corresponding identification information, storing the log data into a physical transaction, and writing the physical transaction into a local redo log file; the method comprises the steps of obtaining a physical data page corresponding to a physical transaction from a local redo log file according to identification information, and executing corresponding data processing according to the physical data page, so that the technical problems of long data synchronization delay of a master node and a slave node, high data synchronization cost, poor expansibility and influence on the system performance of a database in the existing business processing method are solved, and the technical effects of remarkably reducing the data synchronization delay condition of the master node and the slave node of the database, reducing the data synchronization cost, improving the expansibility of data synchronization and improving the system performance of the database are achieved.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

20页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种处理审计日志数据的方法、系统、设备和存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!