Data processing method, node determination method, device, equipment and medium

文档序号：95810 发布日期：2021-10-12 浏览：22次中文

阅读说明：本技术 数据处理方法、节点确定方法、装置、设备及介质 (Data processing method, node determination method, device, equipment and medium ) 是由周涌王涛于 2020-03-19 设计创作，主要内容包括：本发明公开了一种数据处理方法、节点确定方法、装置、设备及介质。该节点确定方法包括,首先,获取与数据更新相关的回刷节点集合,以及回刷节点集合中回刷节点的配置信息；接着,根据配置信息确定与回刷节点具有依赖关系的目标数据表；然后,在检测到目标数据表满足第一预设条件时,展示与目标数据表对应的目标节点,以使用户将目标节点添加到回刷节点集合。这样,提高数据回刷的效率。(The invention discloses a data processing method, a node determining device, equipment and a medium. The node determination method comprises the steps of firstly, acquiring a back-brushing node set related to data updating and configuration information of back-brushing nodes in the back-brushing node set; then, determining a target data table having a dependency relationship with the back-brushing node according to the configuration information; then, when the target data table is detected to meet the first preset condition, the target node corresponding to the target data table is displayed, so that the user can add the target node to the back-brushing node set. Thus, the efficiency of data back-brushing is improved.)

1. A node determination method, comprising:

acquiring a back-brushing node set related to data updating and configuration information of back-brushing nodes in the back-brushing node set;

determining a target data table according to the configuration information, wherein the target data table and the back-brushing node have a dependency relationship;

and when detecting that the target data table meets a first preset condition, displaying a target node corresponding to the target data table so that a user adds the target node to the back-brushing node set.

2. The method of claim 1, wherein the obtaining a set of back-brushing nodes related to data updates comprises:

acquiring a starting node and a terminating node in a plurality of nodes;

screening out a back-brushing node which is positioned between the starting node and the terminating node and is related to data updating according to the starting node and the terminating node;

and generating the back-brushing node set according to the back-brushing nodes.

3. The method of claim 1, wherein the obtaining a set of back-brushing nodes related to data updates comprises:

acquiring a first node in a plurality of nodes;

screening an upstream node and/or a downstream node of the first node from the plurality of nodes;

taking the upstream node and/or the downstream node as the back-brushing node;

and generating the back-brushing node set according to the back-brushing nodes.

4. The method of claim 1, wherein obtaining configuration information for a refresh-back node in the set of refresh-back nodes comprises:

displaying target options corresponding to the configuration information;

responding to selection input of a user to a target option, and acquiring configuration information corresponding to the target option, wherein the configuration information comprises a back-brushing period, a scheduling type, concurrency and scheduling time; wherein the content of the first and second substances,

the refresh cycle is time corresponding to data updating, and the refresh cycle comprises a plurality of discontinuous time intervals; the scheduling type is used for determining a back-brushing form of data updating; the concurrency degree comprises the maximum number of node instances allowed to run simultaneously during data updating; the scheduling time is a time interval for performing data update.

5. The method of claim 1, wherein prior to said exposing a target node corresponding to said target data table, said method further comprises:

detecting whether the target data table meets a first preset condition or not;

wherein, the detecting whether the target data table meets a first preset condition includes:

determining a first partition corresponding to the configuration information in the target data table;

detecting whether missing data exists in the data in the first partition;

and if the data in the first partition is missing, the target data table meets a first preset condition.

6. The method of claim 1, wherein after said presenting a target node corresponding to the target data table, the method further comprises:

adding the target node into the back-brushing node set to obtain a new back-brushing node set;

updating the data in the first data table through the new back-brushing nodes in the new back-brushing node set to obtain result data; wherein the content of the first and second substances,

the first data table comprises a data table having a dependency relationship with the new back-flushed node.

7. The method of claim 6, wherein the updating the data in the first data table by the new back-flushed node in the new set of back-flushed nodes to obtain the result data comprises:

adjusting the back-brushing code of the new back-brushing node according to the configuration information to obtain a target back-brushing code;

and updating the data in the first data table according to the target refresh code to obtain result data.

8. The method of claim 7, wherein the adjusting the back-brushing code of the new back-brushing node according to the configuration information to obtain a target back-brushing code comprises:

when the configuration information comprises data indicating that a plurality of time intervals are updated, adjusting date partition fields of the refresh codes to obtain target refresh codes comprising dynamic partition fields;

when the configuration information comprises data indicating that at least one column of the first data table is updated, adjusting the back-brushing code to obtain a target back-brushing code comprising the column code of the at least one column;

when the configuration information comprises data indicating that at least one row of the first data table is updated, adjusting the back-brushing code to obtain a target back-brushing code comprising a column code of at least one row.

9. The method of claim 7, wherein prior to said updating data in said first data table according to said target refresh-back code resulting in result data, said method further comprises:

determining node instance information of the new back-brushing node;

the updating the data in the first data table according to the target refresh-back code to obtain result data, including:

and updating the data in the first data table according to the target refresh-back code and the node instance information to obtain result data.

10. The method of claim 9, wherein the node instance information includes instance dependencies; the determining node instance information of the new back-flushed node includes:

acquiring a first dependency relationship of a data table generated by actual operation of the new back-brushing node;

and determining the instance dependency relationship of the new back-brushing node according to the first dependency relationship and a second dependency relationship recorded in the new back-brushing node.

11. The method of claim 9 or 10, wherein the node instance information comprises instance resource information; the determining node instance information of the new back-flushed node includes:

acquiring a node instance for operating the new back-brushing node when data is updated;

and when the resources occupied by the node instances meet a second preset condition, determining that the information corresponding to the resources occupied by the node instances is instance resource information.

12. A node determination method, comprising:

acquiring configuration information related to the update data;

presenting at least one node associated with the configuration information;

receiving an operation of selecting a target node in the at least one node by a user;

in response to the operation, determining the target node as a back-brushing node, and adding the back-brushing node to a set of back-brushing nodes to update data in a target data table through the back-brushing nodes in the set of back-brushing nodes, the target data table having a dependency relationship with the back-brushing nodes.

13. A method of data processing, comprising:

acquiring a back-brushing node set related to data updating and configuration information of back-brushing nodes in the back-brushing node set;

determining a target data table according to the configuration information, wherein the target data table and the back-brushing node have a dependency relationship;

when the target data table is detected to meet a first preset condition, displaying a target node corresponding to the target data table;

when the target node is added to the back-brushing node set, obtaining a new back-brushing node set;

updating the data in the first data table through the new back-brushing nodes in the new back-brushing node set to obtain result data; wherein the first data table comprises a data table having a dependency relationship with the new back-flushed node.

14. A node determination apparatus, comprising:

the system comprises an acquisition module, a data updating module and a data updating module, wherein the acquisition module is used for acquiring a back-brushing node set related to data updating and configuration information of back-brushing nodes in the back-brushing node set;

the processing module is used for determining a target data table according to the configuration information, and the target data table and the back-brushing node have a dependency relationship;

and the display module is used for displaying the target node corresponding to the target data table when the target data table is detected to meet a first preset condition, so that the user adds the target node to the back-brushing node set.

15. A node determination apparatus, comprising:

an acquisition module for acquiring configuration information related to the update data;

a display module for displaying at least one node associated with the configuration information;

the receiving module is used for receiving the operation that a user selects a target node from the at least one node;

and the processing module is used for responding to the operation, determining the target node as a back-brushing node, and adding the back-brushing node to a back-brushing node set so as to update data in a target data table through the back-brushing node in the back-brushing node set, wherein the target data table has a dependency relationship with the back-brushing node.

16. A data processing apparatus comprising:

the processing module is used for determining a target data table according to the configuration information, and the target data table and the back-brushing node have a dependency relationship;

the display module is used for displaying a target node corresponding to the target data table when the target data table is detected to meet a first preset condition;

the updating module is used for obtaining a new back-brushing node set when the target node is added to the back-brushing node set;

the processing module is further configured to update the data in the first data table through a new back-brushing node in the new back-brushing node set to obtain result data; wherein the first data table comprises a data table having a dependency relationship with the new back-flushed node.

17. A computing device, wherein the device comprises: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements a node processing method according to any one of claims 1 to 11, implements a node determination method according to claim 12, or implements a data processing method according to claim 13.

18. A computer storage medium having computer program instructions stored thereon, which when executed by a processor implement the node processing method of any one of claims 1-11, implement the node determination method of claim 12, or implement the data processing method of claim 13.

Technical Field

The present invention relates to the field of computers, and in particular, to a data processing method, a node determining method, an apparatus, a device, and a medium.

Background

Currently, the process of re-executing the compute code in a compute node in a computing platform to yield result data and deposit the result data into a data table is referred to as "data rollback". For example, the last 30 days of result data are stored in a certain data table, the computing node schedules operation and produces the last 1 day of data every day, and if the original computing code is modified on a certain day and the data in the data table is required to be the computing result data of the latest computing code, data refreshing is required, that is, the modified latest computing code is used for recalculating and producing the last 30 days of result data.

In the data refresh mode at the present stage, it is necessary to manually check whether data in the data table has data missing, and if data missing exists, the result of data refresh is empty, so that data refresh needs to be repeated. Therefore, the above method is time-consuming and labor-consuming, and it is difficult to ensure the inspection is error-free, and the accuracy of the final result data is low.

Disclosure of Invention

The embodiment of the invention provides a data processing method, a data processing device, data processing equipment and a data processing medium, and aims to improve the efficiency of data back-flushing.

In a first aspect, an embodiment of the present invention provides a node determining method, where the method may include:

acquiring a back-brushing node set related to data updating and configuration information of back-brushing nodes in the back-brushing node set;

determining a target data table according to the configuration information, wherein the target data table has a dependency relationship with the back-brushing node;

and when the target data table is detected to meet the first preset condition, displaying the target node corresponding to the target data table so that the user can add the target node to the back-brushing node set.

In a second aspect, an embodiment of the present invention provides a node determining method, where the method may include:

acquiring configuration information related to the update data;

presenting at least one node associated with the configuration information;

receiving an operation of selecting a target node in the at least one node by a user;

In a third aspect, an embodiment of the present invention provides a data processing method, where the method may include:

acquiring a back-brushing node set related to data updating and configuration information of back-brushing nodes in the back-brushing node set;

determining a target data table according to the configuration information, wherein the target data table has a dependency relationship with the back-brushing node;

when the target data table is detected to meet a first preset condition, displaying a target node corresponding to the target data table;

when the target node is added to the back-brushing node set, a new back-brushing node set is obtained;

updating the data in the first data table through the new back-brushing nodes in the new back-brushing node set to obtain result data; wherein the first data table comprises data tables having a dependency relationship with the new back-flushed node.

In a fourth aspect, an embodiment of the present invention provides a node determining apparatus, where the apparatus may include:

an acquisition module for acquiring configuration information related to the update data;

a display module for displaying at least one node associated with the configuration information;

the receiving module is used for receiving the operation that a user selects a target node from the at least one node;

In a fifth aspect, an embodiment of the present invention provides a node determining apparatus, where the apparatus may include:

the acquisition module is used for acquiring a back-brushing node set related to data updating and configuration information of back-brushing nodes in the back-brushing node set;

the processing module is used for determining a target data table according to the configuration information, and the target data table and the back-brushing node have a dependency relationship;

and the display module is used for displaying the target node corresponding to the target data table when the target data table is detected to meet the first preset condition, so that the user adds the target node to the back-brushing node set.

In a sixth aspect, an embodiment of the present invention provides a data processing apparatus, where the apparatus may include:

the acquisition module is used for acquiring a back-brushing node set related to data updating and configuration information of back-brushing nodes in the back-brushing node set;

the processing module is used for determining a target data table according to the configuration information, and the target data table and the back-brushing node have a dependency relationship;

the display module is used for displaying a target node corresponding to the target data table when the target data table is detected to meet a first preset condition;

the updating module is used for obtaining a new back-brushing node set when the target node is added to the back-brushing node set;

the processing module is further used for updating the data in the first data table through the new back-brushing nodes in the new back-brushing node set to obtain result data; wherein the first data table comprises data tables having a dependency relationship with the new back-flushed node.

In a seventh aspect, an embodiment of the present invention provides a computing device, where the device includes: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements a node determination method as provided by the first aspect, implements a node determination method as provided by the second aspect, or a data processing method as provided by the third aspect.

In an eighth aspect, an embodiment of the present invention provides a computer storage medium, on which computer program instructions are stored, and when executed by a processor, the computer program instructions implement the node determination method provided in the first aspect, the node determination method provided in the second aspect, or the data processing method provided in the second aspect.

The method provided by the embodiment of the invention is suitable for the data back-brushing scene of the batch nodes under multiple conditions. Firstly, acquiring a back-brushing node set related to data updating and configuration information of back-brushing nodes in the back-brushing node set; then, determining a target data table having a dependency relationship with the back-brushing node according to the configuration information; then, when the automatic detection target data table meets the first preset condition, the target node corresponding to the target data table is displayed, so that the user can add the target node to the back-brushing node set. Therefore, the back-brushing node can be determined through one-time node selection, and repeated work of a user is effectively avoided. In addition, the method provides a back-brushing tool for automatically determining the target node, and the nodes do not need to be maintained manually, so that the efficiency of determining the back-brushing nodes is improved, the efficiency of subsequent data back-brushing is further ensured, and the labor cost and the calculation cost are saved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of a data processing interface provided by an embodiment of the invention;

fig. 2 is a schematic flowchart illustrating a node determination method according to an embodiment of the present invention;

FIG. 3 illustrates a schematic diagram of a brush-back type interface provided by an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a node determination apparatus according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a node determination apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a computing device according to an embodiment of the present invention.

Detailed Description

Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

At present, when big data research and development are carried out, various data refreshing requirements are often met. If the data needs to be refreshed in a short time, more months or more, or even years, and the data is refreshed in the current data refreshing mode, a great deal of time and energy are consumed in data research and development, and various problems occur frequently.

The data back-brushing mode at the present stage is mainly suitable for single-node back-brushing. The data back-brushing mode specifically comprises the following steps: the data in the data table corresponding to the upstream node or the downstream node needs to be manually checked to see whether the data is missing, and the user selects the next node to continue data refreshing under the condition that the data in the data table is not manually checked to see whether the data is missing.

Therefore, the application scenarios of the data back-flushing method are limited, that is, the data back-flushing method can only be applied to a data back-flushing scenario for a single node. If a data back-brushing scene of the nodes in batches is faced, the data back-brushing scene needs to be carried out for multiple times, whether data loss exists in the data is detected manually, and the back-brushing sequence of the nodes needs to be maintained manually. Therefore, the whole data back-brushing process is time-consuming and labor-consuming, and the efficiency is low. In addition, the process requires a significant amount of human cost and computing resources. Moreover, because operations such as manually detecting whether data is missing and manually maintaining the back-brushing sequence of the nodes are relied on, the operations are easy to make mistakes, and the error of the back-brushing result data is caused. In addition, if data refresh is performed through the refresh node without management and control, cluster computing resources are occupied, other nodes which normally operate are adversely affected, and normal output of service data (such as statistics of sales, click rate, inter-transmission data between points and the like) is affected.

For example, when data corresponding to a certain data product module is refreshed in the data refreshing manner, nodes corresponding to the data product module are numerous, the period is long, the interdependence between the nodes is complex, and after the refreshing of the upstream node is completed, the next node is called to continue the data refreshing under the condition that no problem exists through manual checking, so that a lot of time is consumed, confusion is easily thought, the correctness of the node refreshing sequence is difficult to guarantee, and the accuracy of the data is more difficult to guarantee. Or, when nodes related to a downstream part of data products need to be refreshed after node codes on a certain key node are adjusted, the number of the downstream nodes is thousands, and the manual combing is difficult to perform. Or, some data products need to perform data refreshing periodically, each data refreshing needs to consume a large amount of time and energy, and in addition, the data refreshing involves a large number of nodes, if the data is submitted to operate at the same time and is not managed and controlled, a large amount of computing resources of the computing platform are occupied, other nodes which normally operate on the computing platform are adversely affected, and meanwhile, a large amount of computing resources are consumed by the data refreshing.

Therefore, in order to solve the above problems, embodiments of the present invention provide a data processing method, a node determining device, a device, and a medium, so that data refresh is simpler and more efficient, the risk of problems caused by data refresh is reduced, the work efficiency of data refresh is improved, and the labor cost and the calculation cost are saved.

Fig. 1 is a schematic diagram of a data processing interface provided in an embodiment of the present invention. As shown in fig. 1, a user (e.g., a data research and development worker) selects a set of back-brushing nodes related to data update on a terminal device according to a requirement of data research and development, where the set of back-brushing nodes includes at least one back-brushing node (e.g., a back-brushing node a, a back-brushing node B, and a back-brushing node C) so as to perform data back-brushing through the at least one back-brushing node.

Then, the terminal device may display the target option according to the determined refresh-back node, so that the user may select or write configuration information (such as task name, refresh-back period, scheduling type, concurrency, scheduling time, and the like in fig. 1) of the refresh-back node related to data refresh-back according to the target option.

And determining a target data table (such as a target data table X) having a dependency relationship with the back-brushing node according to the configuration information selected by the user, and detecting whether the first partition corresponding to the configuration information in the target data table has missing data. And under the condition that the data in the first partition is missing, displaying the target node corresponding to the first partition through the terminal equipment.

Therefore, the upstream node and/or the downstream node which has the incidence relation with the node selected by the user, namely the back-brushing node, can be determined through one-time node selection of the user, and the repeated work of the user is effectively avoided. In addition, the method provided by the embodiment of the invention can provide a back-brushing tool for automatically determining the target node for a user, and does not need to manually maintain the node, so that the efficiency of determining the back-brushing node is improved, the efficiency of subsequent data back-brushing is further ensured, and the labor cost and the calculation cost are saved.

Then, when a target node added to the back-brushing node set by the user is received, updating the back-brushing node set selected by the user to obtain a new back-brushing node set; and then, updating the data in the first data table through the new back-brushing node in the new back-brushing node set to obtain result data so as to finish data back-brushing. Wherein the first data table comprises data tables having a dependency relationship with the new back-flushed node.

It should be noted that the nodes related in the embodiment of the present invention, such as the refresh node and the new refresh node, all refer to basic units describing data analysis and processing processes in the computing platform, and the nodes include information such as a computation code, a node dependency relationship, and a node scheduling parameter. And the target data table, the first data table and the like refer to data storage objects in the computing platform. The first partition, which may also be referred to as a partition table, refers to a partition space specified when creating a table, that is, some fields in the specified table are used as partition columns; in practical application, most of the data are partitioned tables; . The partition can be understood as classification, and different types of data are put under different directories through classification; the classification standard is a partition field, and one or more than one can be adopted; the service date is generally used as a partition field, and each service date partition stores data corresponding to the corresponding service date.

To sum up, the method provided by the embodiment of the present invention is applicable to a scenario of data back-flushing of a plurality of batch nodes, and automatically executes, by a terminal device, to determine whether a first partition corresponding to configuration information in a target data table has missing data; determining a target node according to the dependency relationship with the back-brushing node to automatically complement the back-brushing node; and the back-brushing resource management and control and concurrency control are automatically carried out, so that the risk of artificial errors of data back-brushing is reduced, and the risk of influencing other normal nodes in the calculation cluster is reduced. In addition, the embodiment of the invention provides some tool supports related to configuration information, so that the labor investment is reduced as much as possible, and meanwhile, the consumption of computing resources is reduced, and the labor cost and the computing cost are saved.

Besides being applied to the above scenarios, the embodiments of the present invention can also be used in some data processing scenarios, for example, in a brain scenario of a city, where a process of determining some nodes in history is required. Specifically, as shown below, when the terminal acquires configuration information related to update data (or has different functions according to different scenes, for example, data related to external services or calculation data), at least one node related to the configuration information is displayed; receiving an operation of selecting a target node in at least one node by a user; in response to the operation, the target node is determined to be a back-brushing node, and the back-brushing node is added to the set of back-brushing nodes to update data in the target data table by the back-brushing nodes in the set of back-brushing nodes, the target data table having a dependency relationship with the back-brushing nodes. Alternatively, the target node is directly determined as a function node related to the scenario, for example, a computation node related to computation data, or a service node related to data of an external service, and the like. Based on this, the data processing method in the embodiment of the present invention will be described in detail below. The data processing method can comprise two links, wherein one link is a node determination link, and the other link is a scheduling link for data back-brushing. Therefore, the data back-brushing method can meet the data back-brushing of various user scenes and help users to efficiently and safely back-brush the data. The node determination method is described in detail below with reference to fig. 2.

Fig. 2 is a schematic flowchart illustrating a node determination method according to an embodiment of the present invention. As shown in fig. 2, the node determination method may include steps 210 to 230, which are specifically as follows:

first, in step 210, a set of refresh nodes related to data update and configuration information of the refresh nodes in the set of refresh nodes are obtained. Next, in step 220, the configuration information determines a target data table, which has a dependency relationship with the back-brushing node. Furthermore, in step 230, when it is detected that the target data table meets the first preset condition, the target node corresponding to the target data table is displayed, so that the user adds the target node to the set of back-brushing nodes.

Therefore, the back-brushing node can be determined through one-time node selection, and repeated work of a user is effectively avoided. In addition, the method provides a back-brushing tool for automatically determining the target node, and the nodes do not need to be maintained manually, so that the efficiency of determining the back-brushing nodes is improved, the efficiency of subsequent data back-brushing is further ensured, and the labor cost and the calculation cost are saved.

The above steps are described separately below.

First, referring to step 210, how to obtain a refresh-back node set related to data update in different application scenarios, the embodiment of the present invention provides the following four ways according to different application scenarios, which are specifically shown as follows:

mode (1): and determining a back-brushing node set according to the single node.

Receiving a first input of a user selecting a single node in a computing platform; and responding to the first data, and generating a back-brushing node set according to the single node selected by the user. It is to be understood that only one of the set of rollback nodes is included.

Mode (2): and searching the nodes in batch.

And receiving second input of selecting a plurality of nodes in the computing platform by the user, and generating a back-brushing node set according to the plurality of nodes selected by the user in response to the second input. It can be understood that the set of back-brushing nodes includes a plurality of back-brushing nodes, even a large number of back-brushing nodes.

Mode (3): and determining the intermediate link node through the starting node and the terminating node.

The method comprises the steps of receiving a third input of a user selecting a starting node and a terminating node in an interface displayed by the terminal equipment, responding to the third input, screening a back-brushing node which is located between the starting node and the terminating node and is related to data updating from a computing platform according to the obtained starting node and the terminating node, and generating a back-brushing node set according to the back-brushing node.

Here, the user only needs to determine the start node and the end node, and the terminal device may determine an intermediate link node existing between the start node and the end node according to a relationship between the two nodes. Therefore, the time for searching the intermediate link node by the user and the user operation are reduced. It should be noted that, in this scenario, the set of refresh-back nodes may include the start node and the end node in addition to the intermediate link node existing between the start node and the end node.

The mode (4) generates a back-brushing node set according to an upstream node and/or a downstream node of a certain node.

Receiving a fourth input of a user for selecting the first node from the plurality of nodes of the computing platform according to the user requirement, and screening an upstream node and/or a downstream node of the first node from the plurality of nodes in response to the fourth input; taking the upstream node and/or the downstream node as a back-brushing node; and generating a back-brushing node set according to the back-brushing nodes.

Here, in a scenario where the user does not know the upstream node and/or the downstream node of the first node, the upstream node and/or the downstream node may be filtered in this way, and the set of back-brushing nodes may be generated. Therefore, the time for recording the upstream node and/or the downstream node of each node by the user is reduced, the method is suitable for a scene of data refresh by adopting a plurality of refresh nodes, and the data refresh efficiency is improved.

What needs to be prompted is that in the process of selecting the back-brushing node from the plurality of nodes of the computing platform and adding the back-brushing node to the back-brushing node set, after the back-brushing node is added, the back-brushing node can be searched again and added again, and the process is repeated. Here, after the adding of the back-brushing node is completed, the terminal device may create a back-brushing job, that is, an abstract representation of a back-brushing node set, which represents a node range of data back-brushing. The back-brushing operation can be repeatedly used in the back-brushing process, so that the repeated circle selection of the back-brushing nodes is avoided.

Based on this, after determining the set of the back-brushing nodes, some necessary information also needs to be configured each time the back-brushing of data is performed. The embodiment of the present invention further provides a manner for obtaining configuration information of a refresh node in a refresh node set, which is specifically as follows:

displaying a target option corresponding to the configuration information to a user through the terminal equipment; and responding to the selection input of the target option by the user, and acquiring configuration information corresponding to the target option, wherein the configuration information comprises a back-brushing period, a scheduling type, a concurrency degree and scheduling time.

Wherein, the back-brushing period is the time corresponding to the data update. The back-brushing period includes a plurality of time intervals that are not consecutive. For example, the user specifies which service time corresponding data is to be refreshed according to the refresh data. Multiple discrete service time intervals may be set, such as to refresh the data from month 11 to month 12; alternatively, the brush cycles can be designated [2 x-11 to 2 x-12 months, 2 x-9 to 2 x-10 months ]; still alternatively, the user may also specify to only refresh data for a certain day of the week or a certain day of the month, depending on the actual situation.

The scheduling type is used to determine a back-brushing form of the data update. For example, multiple-day parallel or single-day serial may be selected; if the configuration is multi-day parallel, data of different service dates can be simultaneously brushed; if single-day serialization is configured, then the data back-brushing will be performed serially from small to large by service date.

The concurrency degree includes the maximum number of node instances allowed to run simultaneously at the time of data update. For example, when configuration data is updated, i.e., data is flushed back, the maximum number of node instances allowed to run simultaneously; when data refresh needs to be completed faster, the concurrency can be set larger to reduce the overall refresh time. Here, a node instance refers to an object after a node is instantiated, and a corresponding node instance is generated every time the node runs. The number of node instances refers to the number of node instances.

The schedule time is a time interval in which data update is performed. For example, the allowable data update execution time interval may be configured, and the scheduling time may be configured to be 10:30-23:00, at this time, the refresh-back task is allowed to run only in this time period every day, and the refresh-back tasks are not run in other time periods, so as to avoid preempting the computing resources of other normal computing tasks. Here, the back-brushing task is an object after instantiation of the back-brushing job, the back-brushing job mainly defines a set of back-brushing nodes to be back-brushed, the back-brushing task can be understood as one-time operation of the set of back-brushing nodes, and the back-brushing job can be operated only after configuration information such as a back-brushing period, a scheduling type, a task concurrency number, a priority and the like is added.

In addition, it should be noted that the manner in the method that involves obtaining the initial set of back-brushing nodes may include:

acquiring configuration information related to the update data; presenting at least one node associated with the configuration information; receiving an operation of selecting a target node in at least one node by a user; in response to the operation, the target node is determined to be a back-brushing node, and the back-brushing node is added to the set of back-brushing nodes to update data in the target data table by the back-brushing nodes in the set of back-brushing nodes, the target data table having a dependency relationship with the back-brushing nodes.

Next, referring to step 220, the target data table in the embodiment of the present invention may be determined directly according to the configuration information, or may be confirmed according to the configuration information and the data table on the back-brushing node.

Then, step 230 is involved, which may be preceded by, in a possible embodiment: whether the target data table meets a first preset condition is detected.

Further, determining a first partition corresponding to the configuration information in the target data table; detecting whether missing data exists in data in the first partition; if the data in the first partition is missing, the target data table meets a first preset condition, so that the user adds the target node to the back-brushing node set.

Here, the terminal device automatically checks whether the partition corresponding to the corresponding service back-brushing period of the target data table used in the back-brushing node code exists, and if the data in the partition is missing, the back-brushed result data may also be empty or wrong. Therefore, if the data in the partition is missing, the node corresponding to the data table comprising the partition can be determined as the target node, so that the user can add the target node into the set of the back-brushing nodes to perform the back-brushing task; or, the data of the partition corresponding to the latest service back-brushing period is used by the target data table for data back-brushing.

Thus, in a possible embodiment, based on the target node determined in the above manner, after step 230, a process of scheduling data back-flushing in data processing may be further included, and the process of step 240 and step 250 is described in detail below, specifically as follows:

and 240, adding the target node into the back-brushing node set to obtain a new back-brushing node set.

Here, two ways are provided in the embodiment of the present invention to obtain a new back-brushing node set.

In the mode (1), in a preset time period after the target node corresponding to the target data table is displayed to a user, if the operation that the user adds the target node to the back-brushing node set is not received, the target node can be automatically added to the back-brushing node set to obtain a new back-brushing node set.

And (2) receiving a fifth input that the user adds the target node to the back-brushing node set, responding to fifth data, and adding the target node to the back-brushing node set to obtain a new back-brushing node set.

And step 250, updating the data in the first data table through the new back-brushing nodes in the new back-brushing node set to obtain result data.

Wherein the first data table comprises data tables having a dependency relationship with the new back-flushed node.

Here, in a possible embodiment, the step may specifically include:

adjusting the back-brushing code of the new back-brushing node according to the configuration information to obtain a target back-brushing code; and updating the data in the first data table according to the target refresh code to obtain result data.

Further, some automated optimization processes may be performed on the code (or the rollback code) of the rollback node (or the new rollback node), for example, multi-cycle merged rollback, column-level rollback, or row-level rollback, so as to achieve the purposes of reducing risks and saving costs.

As shown in fig. 3, in the embodiment of the present invention, the rollback code of the new rollback node may be adjusted according to different ways to obtain the target rollback code, which is specifically as follows:

scenario (1): multi-period combined brush

And when the configuration information comprises data indicating that the plurality of time intervals are updated, adjusting the date partition field of the refresh code to obtain a target refresh code comprising the dynamic partition field.

For example: if a certain back-brushing node needs to back-brush service data of a plurality of back-brushing cycles, the corresponding code of the back-brushing node is executed for a plurality of times, and a plurality of node instances are generated, except that each node instance uses input data of different back-brushing cycles. For multiple back-brushing cycles, the user may choose to employ multi-cycle merge back-brushing. Therefore, the date partition field of the back-brushing code can be adjusted, namely, the original node code is automatically rewritten, and the service date partition field is used as the dynamic partition field. Therefore, under the condition of obtaining the back-brushing code comprising the dynamic partition field, the back-brushing code of the node only needs to be executed once, and one node instance is generated to finish the back-brushing of the service data in a plurality of back-brushing periods, so that the consumption of computing resources and the time consumption of running can be reduced.

Scenario (2): row-level return brush

When the configuration information includes data indicating that at least one column of the first data table is updated, adjusting the back-brushing code to obtain a target back-brushing code including the column code of the at least one column.

For example: for some data tables, only data of partial columns in the table needs to be refreshed, for example, newly added columns or partial columns have abnormity and need to be refreshed, a user can select column level refreshing, in this way, codes can be automatically rewritten according to original node codes to generate codes of only refreshing partial columns of data, and because only partial columns of data are calculated, the time consumption of data refreshing and the consumption of calculation resources are greatly reduced.

Scenario (3): line level brush

When the configuration information comprises data indicating that at least one row of the first data table is updated, adjusting the back-brushing codes to obtain target back-brushing codes comprising column codes of at least one row.

For example, in some cases, only part of row data needs to be refreshed, for example, a certain data table is transaction statistical data of a total of 100 ten thousand merchants, and now that data corresponding to 10 merchants have problems and need to be refreshed, a user can select row-level refresh, so that codes of only part of row data needs to be refreshed by automatically rewriting the codes according to original node codes, and thus, the data amount participating in calculation is greatly reduced, and the time consumption for refreshing and the resource consumption for calculation are greatly reduced as in column-level refresh.

And then, after the back-brushing task is created, corresponding instance resource information and node instance dependency relationship need to be generated, and execution is scheduled according to the instance relationship graph. The node instance information in the embodiment of the present invention may include instance dependency relationship and/or instance resource information, and thus, a manner how to determine the instance dependency relationship and the instance resource information is provided in the embodiment of the present invention, which is specifically shown as follows:

(1) determining instance dependencies

Acquiring a first dependency relationship of a data table generated by actual operation of a new back-brushing node;

and determining the instance dependency relationship of the new back-brushing node according to the first dependency relationship and the second dependency relationship recorded in the new back-brushing node.

In this embodiment, the dependency completion may be referred to as dependency completion, that is, the dependency of the node instance is supplemented according to the dependency of the data table generated by the actual operation of the refresh node in the past 30 days, in addition to the explicitly defined dependency in the refresh node.

For example, no cross-day dependency relationship is configured for the nodes of the last N Days (ND), the natural week (CW), the natural month (CM) and the data of the current service date minus 2 days (T-2), all of the nodes actually need data of multiple service dates of the upstream node, and if the node instance relationship completion is not performed, the execution sequence of the node instances may not be in accordance with the logic, so that the final data of the back-brushing result is wrong.

(2) Determining instance resource information

Acquiring a node instance for running a new back-brushing node when data is updated;

and when the resources occupied by the running node instance meet the second preset condition, determining the information corresponding to the resources occupied by the node instance as instance resource information.

Here, the embodiment of the present invention may be referred to as resource management and control, that is, the upper limit of the available computing resources is limited by the time period, and the computing resources are allocated according to the historical resource consumption of the refresh task. For example, a low-consumption task obtains more computing resources, so that the efficiency is improved, and meanwhile, the stability of a computing cluster is prevented from being influenced. In addition, concurrent control can be performed, namely node instances running simultaneously are controlled, so that occupation of scheduling resources of the computing platform is reduced, and the computing platform is submitted to execute when the node instances meet running conditions, and result data are output.

In conclusion, the data processing method provided by the invention is suitable for a scene of batch node data refreshing under most conditions, repeated work can be avoided by one-time node selection, the node refreshing sequence does not need to be maintained manually, and tool support of automatic partition inspection, multi-period combination refreshing, column level or row level refreshing is provided, so that the efficiency of data refreshing work is improved. In addition, the risk of manual errors of data refresh is reduced, and the risk of influencing other normal nodes in the computing cluster is reduced by automatically executing partition inspection, automatically completing the dependency relationship of the refresh nodes, and automatically performing refresh resource management and control and concurrency control. Therefore, the embodiment of the invention provides a series of tool supports, reduces the labor input as much as possible, and is beneficial to reducing the consumption of computing resources, thereby saving the labor cost and the computing cost.

Based on the above node determining method, an embodiment of the present invention further provides a node determining apparatus, which can be specifically described with reference to fig. 4.

Fig. 4 is a schematic structural diagram of a node determination apparatus according to an embodiment of the present invention.

As shown in fig. 4, the node determining means 40 may include:

an obtaining module 401, configured to obtain a set of refresh nodes related to data update and configuration information of the refresh nodes in the set of refresh nodes;

a processing module 402, configured to determine a target data table according to the configuration information, where the target data table has a dependency relationship with the back-brushing node;

the displaying module 403 is configured to, when it is detected that the target data table meets the first preset condition, display a target node corresponding to the target data table, so that the user adds the target node to the back-brushing node set.

In a possible embodiment, the obtaining module 401 may be specifically configured to obtain a start node and a stop node in a plurality of nodes; screening a back-brushing node which is positioned between the starting node and the terminating node and is related to data updating according to the starting node and the terminating node; and generating a back-brushing node set according to the back-brushing nodes.

In another possible embodiment, the obtaining module 401 may be specifically configured to obtain a first node in a plurality of nodes; screening an upstream node and/or a downstream node of a first node from a plurality of nodes; taking the upstream node and/or the downstream node as a back-brushing node; and generating a back-brushing node set according to the back-brushing nodes.

In addition, the obtaining module 401 in the embodiment of the present invention may also be configured to show the target option corresponding to the configuration information through the showing module 403; and responding to the selection input of the target option by the user, and acquiring configuration information corresponding to the target option, wherein the configuration information comprises a back-brushing period, a scheduling type, a concurrency degree and scheduling time.

The refresh cycle is time corresponding to data updating, and comprises a plurality of discontinuous time intervals; the scheduling type is used for determining a back-brushing form of the data update; the concurrency degree comprises the maximum number of node instances allowed to run simultaneously during data updating; the schedule time is a time interval in which data update is performed.

In addition, the node determining apparatus 40 in the embodiment of the present invention may further include a detecting module 404, configured to detect whether the target data table meets a first preset condition.

The detecting module 404 may be specifically configured to determine, in the target data table, a first partition corresponding to the configuration information; detecting whether missing data exists in data in the first partition; if the data in the first partition is missing, the target data table meets a first preset condition.

The processing module 402 in the embodiment of the present invention may also be configured to add the target node to the back-brushing node set to obtain a new back-brushing node set; updating the data in the first data table through the new back-brushing nodes in the new back-brushing node set to obtain result data; wherein the first data table comprises data tables having a dependency relationship with the new back-flushed node.

In a possible embodiment, the processing module 402 may be specifically configured to, according to the configuration information, adjust the refresh-back code of the new refresh-back node to obtain a target refresh-back code; and updating the data in the first data table according to the target refresh code to obtain result data.

Further, the processing module 402 may be specifically configured to, when the configuration information includes data indicating that the plurality of time intervals are updated, adjust a date partition field of the refresh code to obtain a target refresh code including a dynamic partition field; or when the configuration information comprises data indicating that at least one column of the first data table is updated, adjusting the back-brushing codes to obtain target back-brushing codes comprising the column codes of the at least one column; or when the configuration information comprises data indicating that at least one row of the first data table is updated, adjusting the back-brushing code to obtain a target back-brushing code comprising the column code of at least one row.

In another possible embodiment, the processing module 402 may be specifically configured to determine node instance information of a new back-flushed node; and updating the data in the first data table according to the target back-brushing code and the node instance information to obtain result data.

Further, under the condition that the node instance information includes an instance dependency relationship, the processing module 402 may be specifically configured to obtain a first dependency relationship of a data table generated by actual operation of a new back-flushed node; and determining the instance dependency relationship of the new back-brushing node according to the first dependency relationship and the second dependency relationship recorded in the new back-brushing node.

In the case that the node instance information includes instance resource information, the processing module 402 may be specifically configured to obtain a node instance that runs a new refresh node when updating data; and when the resources occupied by the running node instance meet the second preset condition, determining the information corresponding to the resources occupied by the node instance as instance resource information.

In addition, based on the data processing method, an embodiment of the present invention further provides a data processing apparatus, which can be specifically described with reference to fig. 5. Fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.

As shown in fig. 5, the data processing apparatus 50 may include:

an obtaining module 501, configured to obtain a set of refresh nodes related to data update and configuration information of the refresh nodes in the set of refresh nodes.

And the processing module 502 is configured to determine a target data table according to the configuration information, where the target data table has a dependency relationship with the back-brushing node.

The displaying module 503 is configured to display a target node corresponding to the target data table when it is detected that the target data table meets the first preset condition.

An updating module 504, configured to obtain a new set of back-brushing nodes when the target node is added to the set of back-brushing nodes.

The processing module 502 is further configured to update the data in the first data table by a new refresh node in the new refresh node set to obtain result data; wherein the first data table comprises data tables having a dependency relationship with the new back-flushed node.

Based on the foregoing method, an embodiment of the present invention further provides a node determining apparatus, which can be specifically described with reference to fig. 6. Fig. 6 is a schematic structural diagram of a node determination apparatus according to an embodiment of the present invention.

As shown in fig. 6, the node determining means 60 may include:

an obtaining module 601, configured to obtain configuration information related to update data;

a display module 602, configured to display at least one node related to configuration information;

a receiving module 603, configured to receive an operation of selecting a target node from at least one node by a user;

and the processing module 604 is configured to, in response to the operation, determine the target node as a back-brushing node, and add the back-brushing node to the set of back-brushing nodes, so as to update data in the target data table through the back-brushing node in the set of back-brushing nodes, where the target data table has a dependency relationship with the back-brushing node. Therefore, a back-brushing node set related to data updating and configuration information of back-brushing nodes in the back-brushing node set are obtained; then, determining a target data table having a dependency relationship with the back-brushing node according to the configuration information; then, when the automatic detection target data table meets the first preset condition, the target node corresponding to the target data table is displayed, so that the user can add the target node to the back-brushing node set. Therefore, the back-brushing node can be determined through one-time node selection, and repeated work of a user is effectively avoided. In addition, the method provides a back-brushing tool for automatically determining the target node, and the nodes do not need to be maintained manually, so that the efficiency of determining the back-brushing nodes is improved, the efficiency of subsequent data back-brushing is further ensured, and the labor cost and the calculation cost are saved.

The data processing apparatus, the node determination apparatus, and the data processing method and the node determination method according to the embodiments of the present invention described in conjunction with fig. 1 to 3 may be implemented by a computing device. The computing device, as shown in fig. 7, may include a processor 701 and a memory 702 storing computer program instructions.

Specifically, the processor 701 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.

Memory 702 may include a mass storage for data or instructions. By way of example, and not limitation, memory 702 may include a Hard Disk Drive (HDD), a floppy disk drive, flash memory, an optical disk, a magneto-optical disk, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 702 may include removable or non-removable (or fixed) media, where appropriate. The memory 702 may be internal or external to the integrated gateway device, where appropriate. In a particular embodiment, the memory 702 is non-volatile solid-state memory. In a particular embodiment, the memory 702 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these.

The processor 701 realizes the data processing method or the node determination method in the above-described embodiments by reading and executing the computer program instructions stored in the memory 702.

The transceiver 703 is mainly used to implement each apparatus in the embodiment of the present invention or communicate with other devices.

In one example, the device may also include a bus 704. As shown in fig. 7, the processor 701, the memory 702, and the transceiver 703 are connected via a bus 704 to complete communication therebetween.

Bus 704 includes hardware, software, or both. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a multi-channel architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 703 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

In one possible embodiment, the present invention provides a computer-readable storage medium on which a computer program is stored, which, when executed in a computer, causes the computer to perform a data processing method or a node determination method of the present invention.

It is to be understood that the invention is not limited to the particular arrangements and instrumentality described in the above embodiments and shown in the drawings. For convenience and brevity of description, detailed description of a known method is omitted here, and for the specific working processes of the system, the module and the unit described above, reference may be made to corresponding processes in the foregoing method embodiments, which are not described herein again.

It will be apparent to those skilled in the art that the method procedures of the present invention are not limited to the specific steps described and illustrated, and that various changes, modifications and additions, or equivalent substitutions and changes in the sequence of steps within the technical scope of the present invention are possible within the technical scope of the present invention as those skilled in the art can appreciate the spirit of the present invention.

20页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：恢复用户数据的方法、装置、存储介质及终端

Data processing method, node determination method, device, equipment and medium

相关技术

网友询问留言