Method, device, equipment and computer readable medium for processing data

文档序号:406833 发布日期:2021-12-17 浏览:23次 中文

阅读说明:本技术 处理数据的方法、装置、设备和计算机可读介质 (Method, device, equipment and computer readable medium for processing data ) 是由 陈洪健 钱叶 屠志强 于 2021-09-16 设计创作,主要内容包括:本发明公开了处理数据的方法、装置、设备和计算机可读介质,涉及计算机技术领域。该方法的一具体实施方式包括:根据数据源中的表分区,删除版本表中对应表分区的数据,所述表分区是根据业务需求预设的;将数据源中的表分区的数据推送到所述版本表中对应表分区,所述版本表的格式与正式表的格式一致;校验成功所述版本表中对应表分区内的数据后,复制所述版本表中对应表分区内的数据到所述正式表中。该实施方式能够缩短数据推送过程的空窗期,以及降低数据波动。(The invention discloses a method, a device, equipment and a computer readable medium for processing data, and relates to the technical field of computers. One embodiment of the method comprises: deleting data corresponding to the table partitions in the version table according to the table partitions in the data source, wherein the table partitions are preset according to service requirements; pushing data of a table partition in a data source to a corresponding table partition in the version table, wherein the format of the version table is consistent with that of a formal table; and after the data in the corresponding table partition in the version table is verified successfully, copying the data in the corresponding table partition in the version table into the formal table. The embodiment can shorten the blank window period of the data pushing process and reduce data fluctuation.)

1. A method of processing data, comprising:

deleting data corresponding to the table partitions in the version table according to the table partitions in the data source, wherein the table partitions are preset according to service requirements;

pushing data of a table partition in a data source to a corresponding table partition in the version table, wherein the format of the version table is consistent with that of a formal table;

and after the data in the corresponding table partition in the version table is verified successfully, copying the data in the corresponding table partition in the version table into the formal table.

2. The method of claim 1, wherein the verifying the data in the corresponding table partition in the version table comprises:

checking that the data volume of each slice in the corresponding table partition in the version table is the same as the data volume of each slice in the table partition in the data source;

and confirming the service index of the data in the corresponding table partition in the version table within the service threshold range.

3. The method of processing data according to claim 2, wherein said checking that the amount of data per tile in the corresponding table partition in the version table is the same as the amount of data per tile in the table partition in the data source comprises:

if the data volume of the corresponding table partition in the version table is different from the data volume of the table partition in the data source, determining the data volume of each fragment in the corresponding table partition in the version table to acquire fragments with different fragment data volumes;

and based on the fragments with different fragment data volumes, pushing the fragments in the table partitions in the data source to the version table again until the data volume of each fragment in the corresponding table partition in the version table is checked to be the same as the data volume of each fragment in the table partition in the data source.

4. The method of claim 3, wherein determining the amount of data for each partition in the corresponding table partition in the version table comprises:

and carrying out hash processing on the high-dispersion field and the index field of each fragment in the version table in sequence to determine the data volume of each fragment in the corresponding table partition in the version table.

5. The method of processing data according to claim 1, wherein said copying data in a corresponding table partition in said version table into said formal table comprises:

checking a partition corresponding to a table partition in the data source in the formal table, wherein data are not stored;

and copying the data in the corresponding table partition in the version table into the formal table.

6. The method of processing data according to claim 1, wherein said copying data in a corresponding table partition in said version table into said formal table comprises:

checking a partition in the formal table corresponding to a table partition in the data source, wherein the data is stored;

storing the stored data to a cache table, and deleting the stored data;

and copying the data in the corresponding table partition in the version table into the formal table.

7. The method of processing data according to claim 6, wherein after copying the data in the corresponding table partition in the version table into the formal table, further comprising:

checking that the data volume of each fragment in the corresponding table partition in the formal table is the same as the data volume of each fragment in the table partition in the data source.

8. An apparatus for processing data, comprising:

the deleting module is used for deleting data corresponding to the table partitions in the version table according to the table partitions in the data source, and the table partitions are preset according to service requirements;

the pushing module is used for pushing the data of the table partitions in the data source to the corresponding table partitions in the version table, and the format of the version table is consistent with that of the formal table;

and the processing module is used for copying the data in the corresponding table partition in the version table to the formal table after the data in the corresponding table partition in the version table is successfully verified.

9. An electronic device for processing data, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a computer-readable medium for processing data.

Background

Data pushing is an important ring in a data link. The currently widely used number pushing method is to acquire required data from a data source, then import the data according to a partition field, and if the data exists in the partition, delete the data in the partition and then push the number.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: because the data needs to be deleted first and then pushed for a period of time, the data has a window period and severe fluctuation.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, a device, and a computer-readable medium for processing data, which can shorten a blank window period of a data pushing process and reduce data fluctuation.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of processing data, including:

deleting data corresponding to the table partitions in the version table according to the table partitions in the data source, wherein the table partitions are preset according to service requirements;

pushing data of a table partition in a data source to a corresponding table partition in the version table, wherein the format of the version table is consistent with that of a formal table;

and after the data in the corresponding table partition in the version table is verified successfully, copying the data in the corresponding table partition in the version table into the formal table.

The successful verification of the data in the corresponding table partition in the version table comprises the following steps:

checking that the data volume of each slice in the corresponding table partition in the version table is the same as the data volume of each slice in the table partition in the data source;

and confirming the service index of the data in the corresponding table partition in the version table within the service threshold range.

The checking that the amount of data of each tile in the corresponding table partition in the version table is the same as the amount of data of each tile in the table partition in the data source comprises:

if the data volume of the corresponding table partition in the version table is different from the data volume of the table partition in the data source, determining the data volume of each fragment in the corresponding table partition in the version table to acquire fragments with different fragment data volumes;

and based on the fragments with different fragment data volumes, pushing the fragments in the table partitions in the data source to the version table again until the data volume of each fragment in the corresponding table partition in the version table is checked to be the same as the data volume of each fragment in the table partition in the data source.

The determining the data amount of each partition in the corresponding table partition in the version table includes:

and carrying out hash processing on the high-dispersion field and the index field of each fragment in the version table in sequence to determine the data volume of each fragment in the corresponding table partition in the version table.

The copying data in the corresponding table partition in the version table into the formal table includes:

checking a partition corresponding to a table partition in the data source in the formal table, wherein data are not stored;

and copying the data in the corresponding table partition in the version table into the formal table.

The copying data in the corresponding table partition in the version table into the formal table includes:

checking a partition in the formal table corresponding to a table partition in the data source, wherein the data is stored;

storing the stored data to a cache table, and deleting the stored data;

and copying the data in the corresponding table partition in the version table into the formal table.

After copying the data in the corresponding table partition in the version table into the formal table, the method further includes:

checking that the data volume of each fragment in the corresponding table partition in the formal table is the same as the data volume of each fragment in the table partition in the data source.

According to a second aspect of the embodiments of the present invention, there is provided an apparatus for processing data, including:

the deleting module is used for deleting data corresponding to the table partitions in the version table according to the table partitions in the data source, and the table partitions are preset according to service requirements;

the pushing module is used for pushing the data of the table partitions in the data source to the corresponding table partitions in the version table, and the format of the version table is consistent with that of the formal table;

and the processing module is used for copying the data in the corresponding table partition in the version table to the formal table after the data in the corresponding table partition in the version table is successfully verified.

According to a third aspect of embodiments of the present invention, there is provided an electronic device for processing data, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method as described above.

According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method as described above.

One embodiment of the above invention has the following advantages or benefits: deleting data corresponding to the table partitions in the version table according to the table partitions in the data source, wherein the table partitions are preset according to service requirements; pushing data of a table partition in a data source to a corresponding table partition in the version table, wherein the format of the version table is consistent with that of a formal table; and after the data in the corresponding table partition in the version table is verified successfully, copying the data in the corresponding table partition in the version table into the formal table. And copying the data in the data source to a formal table through the version table, thereby shortening the blank window period of the data pushing process and reducing data fluctuation.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of a main flow of a method of processing data according to an embodiment of the invention;

FIG. 2 is a flowchart illustrating checking data in a corresponding table partition in a successful version table according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating checking data volume is the same according to an embodiment of the present invention;

FIG. 4 is a flow chart illustrating the process of copying data in a corresponding table partition in a version table into a formal table according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart of copying data in a corresponding table partition in a version table into a formal table according to another embodiment of the present invention;

FIG. 6 is a schematic flow chart of Clickhouse processing data according to an embodiment of the invention;

fig. 7 is a schematic diagram of a main structure of an apparatus for processing data according to an embodiment of the present invention;

FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 9 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Currently, the commonly used method of pushing numbers is to first obtain the required data from the data source. The Clickhouse is used as a real-time analysis type database, and bears more functions of query, so that business data are required to be stored in databases such as MySQL, Hive, Kafka and the like after being processed, and then result data are transmitted into the Clickhouse, and therefore data pushing is an important ring in a data link.

And then, importing data into the Clickhouse according to the partition field in the Clickhouse, and if the data exists in the partition, deleting the data in the partition and then pushing the data.

However, data push causes data to have blank window period and severe fluctuation. .

In order to solve the problems of window period and severe fluctuation of data, the following technical scheme in the embodiment of the invention can be adopted.

Referring to fig. 1, fig. 1 is a schematic diagram of a main flow of a method for processing data according to an embodiment of the present invention, in which data in a data source is pushed into a formal table through a version table. As shown in fig. 1, the method specifically comprises the following steps:

s101, deleting data corresponding to the table partitions in the version table according to the table partitions in the data source, wherein the table partitions are preset according to business requirements.

In the embodiment of the invention, the version table is added in the data source pushing process, so that the data in the data source is pushed to the formal table through the version table. The format of the version table is consistent with that of the formal table, so that the data of the version table can be directly pushed to the formal table to be stored. As one example, the formal table includes a Clickhouse formal table.

For a data source, the data source may be one of Hive, MYSQL, and KAFKA. Various data sources can be applied to the technical scheme in the embodiment of the invention.

The data source comprises a plurality of table partitions, and the table partitions are preset according to service requirements. As an example, data related to business requirement 1 is stored in table partition A1 and data related to business requirement 2 is stored in table partition A2. The table partition in the data source is set, so that the efficiency of processing data can be improved.

The version table is used for transmitting data in a data source to the formal table, and then a plurality of table partitions are included in the version table. The table partitions in the version table correspond to the table partitions in the data source. As one example, the table partitions in the data source include a1, a2, and A3, and the table partitions in the version table include B1, B2, and B3. Wherein, A1 corresponds to B1, A2 corresponds to B2, and A3 corresponds to B3.

In order to push the table partition data in the data source to the corresponding table partition in the version table to improve the success rate of pushing, the data of the corresponding table partition in the version table needs to be deleted. Because the data in the version table is often pushed to the formal table, deleting the data of the corresponding table partition in the version table does not affect processing the data after the data in the version table is pushed to the formal table.

As an example, the data source is Hive and the default table partition is dt. According to dt in Hive, data of a partition corresponding to dt in the version table is deleted.

S102, pushing the data of the table partitions in the data source to corresponding table partitions in the version table, wherein the format of the version table is consistent with that of the formal table.

After the data of the corresponding table partition in the version table is deleted, the data of the table partition in the data source can be pushed to the corresponding table partition in the version table. The format of the version table is consistent with that of the formal table, so that the data of the version table can be directly pushed to the formal table to be stored.

As an example, the data source is Hive and the default table partition is dt. And after the data of the partition corresponding to dt in the version table is deleted, directly pushing the data of dt in Hive into the partition corresponding to dt in the version table.

S103, after the data in the corresponding table partition in the version table is verified successfully, the data in the corresponding table partition in the version table is copied into a formal table.

The precondition for copying the data in the corresponding table partition in the version table to the formal table is that the data in the corresponding table partition in the successful version table is verified. That is, only when the data in the corresponding table partition in the version table is consistent with the data in the table partition in the data source, the data in the version table is copied to the formal table, so as to improve the accuracy of the formal table data.

Referring to fig. 2, fig. 2 is a schematic flowchart of a process of verifying data in a corresponding table partition in a successful version table according to an embodiment of the present invention, which specifically includes the following steps:

s201, checking that the data volume of each fragment in the corresponding table partition in the version table is the same as the data volume of each fragment in the table partition in the data source.

In the embodiment of the invention, the version table is detected by the data amount. Specifically, it is detected whether the amount of data in the push version table is equal to the amount of output data of the data source.

And if the data volume pushed into the version table is equal to the output data volume of the data source, determining that the data volume of each fragment in the corresponding table partition in the version table is the same as the data volume of each fragment in the table partition in the data source.

If the data amount in the push version table is not equal to the output data amount of the data source, the reason is data push or data push missing. In order to ensure the accuracy of the data, the data needs to be re-pushed to maintain the consistency of the data.

In the prior art, output data in a data source is mostly pushed to a version table in a full-weight mode. For the case of large data volume, the time cost of full-scale re-push is very high, and the process of pushing number is further prolonged because more push or missing push occurs at a very high probability.

Through multiple practices, it is known that when multiple or missing push occurs in the number pushing process, the probability is that some fragment data of the data are in error due to system reasons. In the embodiment of the present invention, the data is rolled back to push the data again in units of fragments, so that the period of the whole pushed data can be greatly shortened compared with the full-weight pushing.

Referring to fig. 3, fig. 3 is a schematic flow chart illustrating that the checking data amount is the same according to the embodiment of the present invention, and specifically includes the following steps:

s301, if the data volume of the corresponding table partition in the version table is different from the data volume of the table partition in the data source, determining the data volume of each fragment in the corresponding table partition in the version table to acquire the fragments with different fragment data volumes.

Firstly, data in a corresponding table partition in a version table is processed in a slicing mode, namely the data are sliced into a plurality of pieces. Wherein, the fragmentation quantity is preset based on the distributed machine nodes. Then, the data amount of each slice is checked in units of slices.

And performing hash processing on the high-dispersion field and the index field of each fragment in the version table in sequence to determine the data volume of each fragment in the corresponding table partition in the version table. The high-dispersion field refers to data obtained by uniformly splitting data on the same fragment.

The high dispersion field is evenly distributed over each tile, thereby avoiding data skew. The data skew means: the cache data is not dispersed enough, so that a large amount of cache data is concentrated on one or several service nodes.

As one example, the formal table is a Clickhouse formal table. When pushing the Clickhouse, the data is sliced according to the number of the slices. Then, HASH processing is performed according to formula 1 to obtain the data volume of the segment.

COACALASE (ABS (HASH (A, B, high dispersion field)) as int)% number of tiles, 0) equation 1

And when the dispersion of the index field is not high, adding one or more fields with high dispersion. HASH () is a compute HASH value function. ABS () is an absolute value function that calculates a hash value. CAST () is to convert the value of the absolute value function to int type. The role of COACALASE () is: if the result of CAST () is null, then result 0 is returned.

Determining the data volume of each fragment in the corresponding table partition in the version table, determining the data volume of each fragment in the table partition in the data source, and obtaining the fragments with different fragment data volumes by comparing the data volumes of each fragment in the version table and the data source.

S302, based on the fragments with different fragment data volumes, pushing the fragments in the table partitions in the data source to the version table again until the data volume of each fragment in the corresponding table partitions in the checking version table is the same as the data volume of each fragment in the table partitions in the data source.

After the fragments with different fragment data volumes in the corresponding table partitions in the version table are located, the data of the fragments can be pushed from the data source instead of all the data. That is, the numbers are pushed from the shards in the table partition in the data source to the version table again. After the fragment data is pushed, checking the data volume of the fragment again, and if the data volume of the fragment in the corresponding table partition in the version table is the same as the data volume of the fragment in the table partition in the data source, finishing the checking; otherwise, the data of the segment needs to be pushed again.

In the embodiment of fig. 3, slices with different amounts of data are located and the number of data that needs to be pushed under the slice is pushed back. For a data table with a large data volume and easily-caused problems of the number of pushes, the success rate of the number of pushes can be greatly improved, and the time loss of the number of pushes again can be reduced.

S202, confirming the service index of the data in the corresponding table partition in the version table within the service threshold range.

After the data volume of each fragment is checked to be the same, it needs to be confirmed whether the service index is within the service threshold range. Specifically, a service index is selected and checked against a service threshold range.

As one example, the business metric includes page visitation volume (PV). And performing sum () calculation on the PV of the data in the corresponding table partition in the version table, and taking the data in the previous time period as a service threshold range. And if the calculated value is not in the service threshold range, the data is abnormal. The data in the version table is not required to be copied into the formal table; and if the calculated value is within the service threshold range, the data is normal, and the data in the version table is copied into the formal table.

In the embodiment of fig. 2, the data of the version table is checked by checking the data volume of each fragment and the service index, so as to ensure the accuracy of the data of the version table.

After the data in the corresponding table partition in the version table is verified successfully, the data in the corresponding table partition in the version table can be copied into the formal table.

Referring to fig. 4, fig. 4 is a schematic flowchart of a process of copying data in a corresponding table partition in a version table into a formal table according to an embodiment of the present invention, which specifically includes the following steps:

s401, checking a partition corresponding to the table partition in the data source in the formal table, and storing no data.

After the data in the corresponding table partition in the successful version table is verified, it is necessary to check whether the partition corresponding to the table partition in the data source in the formal table stores the data, so as to determine whether to directly copy the data in the version table to the formal table.

The partition corresponding to the table partition in the data source in the formal table is checked, and in the case where no data is stored, S402 may be executed.

S402, copying data in the corresponding table partition in the version table into a formal table.

After the checking is finished, copying the data in the corresponding table partition in the version table into the formal table, and adding the corresponding sub-partition in the formal table. Wherein, the replication can be realized by an ATTACH mode.

The above-described replication process is a process that employs a local table to a local table, rather than a distributed table. Therefore, the network overhead can be reduced, and the blank window period of data updating of the formal table can be further shortened. As an example, the ATTACH method includes:

ALTER TABLE A_local ATTACH PARTITION(dt)FROM A_tmp_local。

in the embodiment of fig. 4, the network overhead is reduced by copying the version table to the formal table, thereby shortening the blank window period of data update of the formal table.

Referring to fig. 5, fig. 5 is a flow diagram illustrating a process of copying data in a corresponding table partition in a version table into a formal table according to another embodiment of the present invention, which specifically includes the following steps:

s501, checking a partition corresponding to the table partition in the data source in the formal table, and storing the data.

And checking the partition corresponding to the table partition in the data source in the formal table, wherein the stored data cannot be directly copied into the formal table.

S502, storing the stored data into a cache table, and deleting the stored data.

When the partition corresponding to the table partition in the data source in the formal table stores data, the data in the partition corresponding to the formal table may be transferred to the cache table, and then the data in the partition corresponding to the formal table may be deleted.

As an example, the data in the partition corresponding to the formal table is transferred to the cache table in an ATTACH manner. The purpose of this is to: and backing up original data in the formal table. When the formal table data is abnormal, the data rollback can be effectively realized.

In one embodiment of the invention, if the formal table is abnormal, data rollback is executed according to the data in the cache table.

After the formal table backup is completed, performing DROP operation on the partition corresponding to the formal table, and then checking whether the data in the partition in each node is completely deleted, so as to avoid the influence of the undeleted data on new data.

S503, copying the data in the corresponding table partition in the version table into the formal table.

Similar to S402, after the checking is completed, the data in the corresponding table partition in the version table is copied to the formal table, and the corresponding sub-partition is added to the formal table. Wherein, the replication can be realized by an ATTACH mode.

In the embodiment of fig. 5, existing data in the formal table is stored by the cache table so as to implement data rollback. And the network overhead is reduced by a copying mode from the version table to the formal table, so that the blank window period of data updating of the formal table is shortened.

And finally, checking the data volume of the corresponding table partition in the formal table and the data volume of the table partition in the data source. If the data volumes of the two partitions are the same, finishing data pushing from the data source to the formal table; and if the data volumes of the two partitions are different, performing data rollback through the cache table, and exiting the pushing process.

In one embodiment of the invention, to check the amount of data, a re-push of the data source to the formal table is avoided. The amount of data in the formal table corresponding to the table partition may be checked against the amount of data in the table partition in the data source in terms of fragmentation.

Specifically, it is checked that the amount of data of each slice in the corresponding table partition in the formal table is the same as the amount of data of each slice in the table partition in the data source. The specific comparison method adopts the technical scheme in the embodiment of fig. 3.

In the embodiment of the present invention, according to a table partition in a data source, data corresponding to the table partition in a version table is deleted, where the table partition is preset according to a service requirement; pushing data of a table partition in a data source to a corresponding table partition in the version table, wherein the format of the version table is consistent with that of a formal table; and after the data in the corresponding table partition in the version table is verified successfully, copying the data in the corresponding table partition in the version table into the formal table. And copying the data in the data source to a formal table through the version table, thereby shortening the blank window period of the data pushing process and reducing data fluctuation.

By adopting the embodiment of the invention, on one hand, the updating of the normal table by the pushed number for a long time is avoided by establishing the version table, and the perception of a user to the pushed number process is greatly reduced. On the other hand, the data verification of the fragments is adopted, so that the data rollback is realized by taking the fragments as units, and the data fluctuation caused by the rollback of the full data is avoided.

Referring to fig. 6, fig. 6 is a schematic flow chart of Clickhouse processing data according to an embodiment of the present invention.

The data source in FIG. 6 is Hive and the default table partition is dt. And adding the process of pushing the number to the version table in the process of pushing the number from Hive to the Clickhouse formal table.

First, according to dt in Hive, data of corresponding dt partitions in the version table is deleted. Then, the data in Hive is extracted and pushed directly to the version table. And after the data volume verification and the service index verification of the corresponding partition data in the version table are successful, pushing the data to a Clickhouse formal table.

The data volume check is performed by taking a slice as a unit check, and the rolling back of the sliced data is executed under the condition that the checking of the sliced data fails, so that the cycle of pushing number is greatly shortened.

In addition, a cache table is set up to store original data of the formal table. When the data of the formal table is abnormal, the data rollback can be executed to restore the formal table.

Referring to fig. 7, fig. 7 is a schematic diagram of a main structure of an apparatus for processing data according to an embodiment of the present invention, where the apparatus for processing data may implement a method for processing data, and as shown in fig. 7, the apparatus for processing data specifically includes:

a deleting module 701, configured to delete data corresponding to a table partition in a version table according to the table partition in the data source, where the table partition is preset according to a service requirement;

a pushing module 702, configured to push data of a table partition in a data source to a corresponding table partition in the version table, where a format of the version table is consistent with a format of a formal table;

the processing module 703 is configured to copy the data in the corresponding table partition in the version table to the formal table after the data in the corresponding table partition in the version table is successfully verified.

In an embodiment of the present invention, the processing module 703 is specifically configured to check that the data amount of each partition in the corresponding table partition in the version table is the same as the data amount of each partition in the table partition in the data source;

and confirming the service index of the data in the corresponding table partition in the version table within the service threshold range.

In an embodiment of the present invention, the processing module 703 is specifically configured to determine the data size of each fragment in the corresponding table partition in the version table to obtain the fragments with different fragment data sizes if the data size of the corresponding table partition in the version table is different from the data size of the table partition in the data source;

and based on the fragments with different fragment data volumes, pushing the fragments in the table partitions in the data source to the version table again until the data volume of each fragment in the corresponding table partition in the version table is checked to be the same as the data volume of each fragment in the table partition in the data source.

In an embodiment of the present invention, the processing module 703 is specifically configured to perform hash processing on the high-dispersion field and the index field of each fragment in the version table in sequence, so as to determine the data amount of each fragment in the corresponding table partition in the version table.

In an embodiment of the present invention, the processing module 703 is specifically configured to check a partition corresponding to a table partition in the data source in the formal table, where no data is stored;

and copying the data in the corresponding table partition in the version table into the formal table.

In an embodiment of the present invention, the processing module 703 is specifically configured to check a partition in the formal table, which corresponds to the table partition in the data source, for stored data;

storing the stored data to a cache table, and deleting the stored data;

and copying the data in the corresponding table partition in the version table into the formal table.

In an embodiment of the present invention, the processing module 703 is further configured to check that the data amount of each partition in the corresponding table partition in the formal table is the same as the data amount of each partition in the table partition in the data source.

Fig. 8 illustrates an exemplary system architecture 800 of a method of processing data or an apparatus for processing data to which embodiments of the present invention may be applied.

As shown in fig. 8, the system architecture 800 may include terminal devices 801, 802, 803, a network 804, and a server 805. The network 804 serves to provide a medium for communication links between the terminal devices 801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use the terminal devices 801, 802, 803 to interact with a server 805 over a network 804 to receive or send messages or the like. The terminal devices 801, 802, 803 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The terminal devices 801, 802, 803 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 805 may be a server that provides various services, such as a back-office management server (for example only) that supports shopping-like websites browsed by users using the terminal devices 801, 802, 803. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the method for processing data provided by the embodiment of the present invention is generally executed by the server 805, and accordingly, the apparatus for processing data is generally disposed in the server 805.

It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a deletion module, a push module, and a processing module. The names of the modules do not form a limitation on the modules themselves in some cases, for example, the deletion module may also be described as "configured to delete data corresponding to a table partition in the version table according to the table partition in the data source, where the table partition is preset according to the service requirement".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:

deleting data corresponding to the table partitions in the version table according to the table partitions in the data source, wherein the table partitions are preset according to service requirements;

pushing data of a table partition in a data source to a corresponding table partition in the version table, wherein the format of the version table is consistent with that of a formal table;

and after the data in the corresponding table partition in the version table is verified successfully, copying the data in the corresponding table partition in the version table into the formal table.

According to the technical scheme of the embodiment of the invention, according to the table partition in the data source, the data of the corresponding table partition in the version table is deleted, and the table partition is preset according to the service requirement; pushing data of a table partition in a data source to a corresponding table partition in the version table, wherein the format of the version table is consistent with that of a formal table; and after the data in the corresponding table partition in the version table is verified successfully, copying the data in the corresponding table partition in the version table into the formal table. And copying the data in the data source to a formal table through the version table, thereby shortening the blank window period of the data pushing process and reducing data fluctuation.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

18页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于编辑信息的方法和装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!