Algorithm packaging method based on distributed technology

文档序号：1798196 发布日期：2021-11-05 浏览：3次中文

阅读说明：本技术 一种基于分布式技术的算法封装方法 (Algorithm packaging method based on distributed technology ) 是由田园保富张航于 2021-07-31 设计创作，主要内容包括：本发明公开了一种基于分布式技术的算法封装方法,包括以下步骤：任务输入：利用输入单元,输入需要执行的任务；任务解析：将输入的任务,解析成若干的学习计划；算法优化：将所述的学习计划,在主节点上进行算法优化,使其形成优化后的调度策略；任务执行：根据优化后的调度策略,寻找对应的从节点,启动执行器,进行任务分配调度；任务输出：将执行后的任务结果,经过算法优化步骤优化后,输出。本发明以分布式为基础,进而为海量数据的计算提供了基础支撑,然后利用优化算法的步骤,让用户避开了繁琐的算法选择过程。(The invention discloses an algorithm packaging method based on a distributed technology, which comprises the following steps: and (3) task input: inputting a task to be executed by using an input unit; task analysis: analyzing the input tasks into a plurality of learning plans; and (3) algorithm optimization: performing algorithm optimization on the learning plan on the main node to form an optimized scheduling strategy; and (3) task execution: according to the optimized scheduling strategy, searching a corresponding slave node, starting an actuator, and performing task allocation scheduling; and (4) task output: and (4) optimizing the executed task result through an algorithm optimization step, and outputting the result. The invention provides basic support for the calculation of mass data on the basis of distribution, and then avoids the fussy algorithm selection process by using the step of optimizing the algorithm.)

1. An algorithm encapsulation method based on a distributed technology is characterized by comprising the following steps:

and (3) task input: inputting a task to be executed by using an input unit;

task analysis: analyzing the input tasks into a plurality of learning plans;

and (3) algorithm optimization: performing algorithm optimization on the learning plan on the main node to form an optimized scheduling strategy;

and (3) task execution: according to the optimized scheduling strategy, searching a corresponding slave node, starting an actuator, and performing task allocation scheduling;

and (4) task output: and (4) optimizing the executed task result through an algorithm optimization step, and outputting the result.

2. The algorithm encapsulation method based on the distributed technology as claimed in claim 1, wherein the task input is specifically: the user submits the tasks to be executed in the mode of the application program to the client through the input unit, and drives the local starting driver.

3. The algorithm encapsulation method based on the distributed technology as claimed in claim 2, wherein the driver local start driver is specifically: the user causes the native driver to start by entering the user code and the session.

4. The algorithm encapsulation method based on the distributed technology as claimed in claim 1, wherein the task is resolved into: the input task is analyzed into a plurality of learning plans by utilizing an analyzer, and each learning plan comprises an application program required for executing the corresponding learning plan, the number of required actuators and corresponding resources.

5. The method of claim 4, wherein the learning plan forms the stored data in the form of metadata, algorithm libraries, and statistics.

6. The algorithm encapsulation method based on the distributed technology as claimed in claim 5, further comprising updating and adding an algorithm, wherein the updating and adding of the algorithm specifically comprises: and updating the original algorithm or adding a new algorithm through an algorithm library.

7. The algorithm encapsulation method based on the distributed technology as claimed in claim 5, wherein the algorithm optimization specifically includes the following steps:

1) reading data: reading stored data in the learning plan;

2) sampling: selecting a part of the stored data as sample data;

3) and (3) standardization: standardizing the sample data pieces;

4) the initialization checking method comprises the following steps: carrying out initialization inspection by utilizing a standardized sample according to a selected scheduling strategy;

5) and (3) cross checking: initializing the inspected sample, utilizing cross inspection, inspecting again, and reserving the sample data passing through the inspection;

6) training a model: selecting a part of the stored data as a training set, and training in a training model;

7) and (3) final model: and selecting a training model conforming to the feature label as a final model according to the feature label.

8. The algorithm packaging method based on the distributed technology as claimed in claim 7, wherein the algorithm is optimized, and the feature labels comprise most common labels, nearest neighbor labels and misclassification rate labels.

9. The algorithm encapsulation method based on the distributed technology as claimed in claim 2, wherein the task execution specifically comprises: and according to the optimized scheduling strategy, searching a corresponding slave node, starting an actuator, establishing connection between the actuator and a local driver, and distributing and scheduling tasks through the actuator until all tasks are completed.

10. The algorithm encapsulation method based on the distributed technology as claimed in claim 1, wherein the task output step specifically comprises: and (3) performing algorithm optimization on the executed task results and then outputting the task results, and meanwhile, performing parameter optimization in input and/or output according to a scheduling strategy of the algorithm optimization step.

Technical Field

The invention belongs to the technical field of distributed algorithm packaging, and particularly relates to an algorithm packaging method based on a distributed technology.

Background

With the arrival of the big data era, the storage total amount of the related data of the power industry reaches a certain scale, and if relevant knowledge can be mined from the storage total amount, and relevant business transformation is guided, the rapid development of the industry is facilitated. Machine learning and statistical knowledge are key to converting these data into knowledge. However, the existing machine learning algorithm is independent and dispersive, the usage is relatively complex, and the model cannot be reused, which causes redundancy and waste of resources to a certain extent. In order to support various data analysis projects in the power industry, a rapid, flexible, simple, convenient, easy-to-use, distributed and reusable enterprise model construction tool is constructed, and an algorithm encapsulation method and the design and implementation of a distributed architecture need to be comprehensively considered.

In order to build a fast, flexible, simple and easy-to-use distributed algorithm framework supporting distributed and reusable distributed algorithm frameworks, the algorithm encapsulation method and the design and implementation of the distributed framework need to be fully considered, so that the distributed algorithm framework has practicability and usability. The framework can support distributed parallel computing, integrates multiple functions of algorithm optimization, parameter preference, data sampling and the like, and can meet the basic requirement of the data analysis direction in the electrical industry.

Disclosure of Invention

The invention aims to provide an algorithm packaging method based on a distributed technology and oriented to the power industry, which enables algorithm packaging to be more optimized by converting tasks into learning plans and simultaneously utilizing real-time input and distributed arrangement.

In order to achieve the technical effects, the invention is realized by the following technical scheme.

An algorithm packaging method based on a distributed technology comprises the following steps:

and (3) task input: inputting a task to be executed by using an input unit;

task analysis: analyzing the input tasks into a plurality of learning plans;

and (3) algorithm optimization: performing algorithm optimization on the learning plan on the main node to form an optimized scheduling strategy;

and (3) task execution: according to the optimized scheduling strategy, searching a corresponding slave node, starting an actuator, and performing task allocation scheduling;

and (4) task output: and (4) optimizing the executed task result through an algorithm optimization step, and outputting the result.

In the technical scheme, compared with the simple execution of a series of tasks, the method decomposes the tasks into a plurality of learning plans, each learning plan corresponds to a certain program, the whole operation is decomposed, and the calculation is faster and more convenient.

In the technical scheme, the analyzed learning plan is optimized again, and in the process, the optimizer tries to quickly return a high-quality algorithm to the user in the background, so that the user avoids a fussy algorithm selection process.

In the technical scheme, a distributed parallel computing framework is realized based on a master-slave mode and real-time memory computing, and basic support is provided for mass data computing.

As a further improvement of the present invention, the task input specifically includes: the user submits the tasks to be executed in the mode of the application program to the client through the input unit, and drives the local starting driver.

In the technical scheme, the input unit is used as an entry point, then the input is realized through the mode of an application program, the driver can be correspondingly started, and meanwhile, the real-time data input by the input unit provides a basis for subsequent data sampling.

As a further improvement of the present invention, the local boot driver is specifically: the user causes the native driver to start by entering the user code and the session.

In the technical scheme, user codes and sessions are set, so that corresponding to a certain user, when data analysis and storage are subsequently performed, tasks generally required to be executed by each user, application programs required to be started and the like are probably inferred.

As a further improvement of the present invention, the task is resolved into: the input task is analyzed into a plurality of learning plans by utilizing an analyzer, and each learning plan comprises an application program required for executing the corresponding learning plan, the number of required actuators and corresponding resources.

Specifically, the learning task is analyzed into a plurality of learning plans, and a corresponding algorithm is performed for each plan, so that the complex problem is simplified, and at the moment, the application program, the number of required actuators and corresponding resources are selected by the way, and a basis is provided for subsequent further planning.

As a further development of the invention, the learning plan forms the stored data in the form of metadata, algorithm libraries and statistics.

In the technical scheme, the stored data is waterproof and diversified, the metadata is available, then algorithms of different tasks or learning plans are carried out by matching with an algorithm library, namely each task corresponds to a certain algorithm, and the algorithm can be directly called out for operation.

As a further improvement of the present invention, the present invention further includes updating and adding of an algorithm, and the updating and adding of the algorithm specifically includes: and updating the original algorithm or adding a new algorithm through an algorithm library.

In the technical scheme, the updating and adding of the algorithm are increased, once the algorithm which is simpler or has higher efficiency is found, the algorithm can be replaced and updated, and the old algorithm is abandoned, so that the space is saved; when a new application program is added, a new algorithm is inevitably brought to adapt to a new task and the like.

As a further improvement of the present invention, the algorithm optimization specifically includes the following steps:

1) reading data: reading stored data in the learning plan;

2) sampling: selecting a part of the stored data as sample data;

3) and (3) standardization: standardizing the sample data pieces;

4) the initialization checking method comprises the following steps: carrying out initialization inspection by utilizing a standardized sample according to a selected scheduling strategy;

5) and (3) cross checking: initializing the inspected sample, utilizing cross inspection, inspecting again, and retaining the sample data passing the inspection.

6) Training a model: selecting a part of the stored data as a training set, and training in a training model;

7) and (3) final model: and selecting a training model conforming to the feature label as a final model according to the feature label.

In the technical scheme, in the optimization step, the optimal final model is selected as the scheduling strategy through model optimization, the obtained model has better effect compared with the prior model, the fussy algorithm selection process is avoided, and the high-quality algorithm is obtained.

As a further improvement of the invention, the algorithm optimizes, and the feature labels comprise the most common label, the nearest neighbor label and the misclassification rate label.

In the technical scheme, a plurality of feature labels are selected, so that the optimized model can be made as much as possible, the precision is high in relative ratio, the angles of the feature labels are different, errors can be avoided to the maximum, and the precision is improved.

As a further improvement of the present invention, the algorithm optimization, the task execution specifically is: and according to the optimized scheduling strategy, searching a corresponding slave node, starting an actuator, establishing connection between the actuator and a local driver, and distributing and scheduling tasks through the actuator until all tasks are completed.

In the technical scheme, the actuator is connected with the local driver, so that the actuator which is not in management contact with the client and the main node runs on the slave node in the cluster through the driver and has a certain association with the main node, and the actuator is managed by the main node during subsequent execution.

As a further improvement of the present invention, the task output step specifically includes: and (3) performing algorithm optimization on the executed task results and then outputting the task results, and meanwhile, performing parameter optimization in input and/or output according to a scheduling strategy of the algorithm optimization step.

According to the technical scheme, when the task is output, the output structure can be optimized, the result accuracy is higher, and meanwhile, the optimization in plan execution is combined to perform the preferred selection of parameters, so that the less suitable parameters are eliminated, the more preferred and more useful parameters are updated, and a foundation is laid for the re-execution of the subsequent task.

Drawings

FIG. 1 is a flow chart of an algorithm encapsulation method based on distributed technology according to the present invention;

FIG. 2 is a flow chart of the algorithm optimization steps provided by the present invention;

FIG. 3 is a block diagram of an algorithm package framework in embodiment 4 provided by the present invention;

FIG. 4 is a schematic flow chart of an algorithm encapsulation method in embodiment 4 provided by the present invention;

FIG. 5 is a flowchart of the algorithm optimization step in embodiment 4 provided by the present invention;

fig. 6 is a schematic diagram of an algorithm package framework in embodiment 4 provided by the present invention.

Detailed Description

The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.

Example 1

This embodiment mainly introduces a flow of an algorithm encapsulation method based on a distributed technology.

Referring to fig. 1, an algorithm encapsulation method based on a distributed technology includes the following steps:

and (3) task input: inputting a task to be executed by using an input unit;

task analysis: analyzing the input tasks into a plurality of learning plans;

and (3) algorithm optimization: performing algorithm optimization on the learning plan on the main node to form an optimized scheduling strategy;

and (3) task execution: according to the optimized scheduling strategy, searching a corresponding slave node, starting an actuator, and performing task allocation scheduling;

and (4) task output: and (4) optimizing the executed task result through an algorithm optimization step, and outputting the result.

In the embodiment, compared with the simple execution of a series of tasks, the method is decomposed into a plurality of learning plans, each learning plan corresponds to a certain program, the whole operation is decomposed, and the calculation is faster and more convenient.

In this embodiment, the analyzed learning plan is optimized again, and in this process, the optimizer tries to quickly return a high-quality algorithm to the user in the background, so that the user avoids a cumbersome algorithm selection process.

In the embodiment, a distributed parallel computing framework is realized based on a master-slave mode and real-time memory computing, and basic support is provided for mass data computing.

Example 2

In this embodiment, task input and task parsing are mainly introduced.

Further, the task input specifically includes: the user submits the tasks to be executed in the mode of the application program to the client through the input unit, and drives the local starting driver.

In this embodiment, use the input unit as the access point, then get into through application's mode, and then can correspond and start the driver, simultaneously, the real-time data of input unit input provides the basis for subsequent data sampling.

Specifically, the local boot driver is specifically: the user causes the native driver to start by entering the user code and the session.

In this embodiment, a user code and a session are provided, and further correspond to a certain user, and when data analysis and storage are subsequently performed, tasks generally required to be executed by each user, application programs required to be started, and the like are presumably inferred.

When the task is analyzed, the task is analyzed as follows: the input task is analyzed into a plurality of learning plans by utilizing an analyzer, and each learning plan comprises an application program required for executing the corresponding learning plan, the number of required actuators and corresponding resources.

In this embodiment, the learning task is analyzed into a plurality of learning plans, and a corresponding algorithm is performed for each plan, so that the complex problem is simplified, and at this time, the application program, the number of required actuators, and corresponding resources are selected, thereby providing a basis for subsequent further planning.

When storing, the learning plan forms stored data in the form of metadata, an algorithm library and statistics.

In the embodiment, the stored data is waterproof and diversified, the existing metadata is matched with an algorithm library to perform algorithms of different tasks or learning plans, namely each task corresponds to a certain algorithm, and the algorithm can be directly called out for operation.

In order to facilitate subsequent calculation, the method further comprises updating and adding of an algorithm, wherein the updating and adding of the algorithm specifically comprises the following steps: and updating the original algorithm or adding a new algorithm through an algorithm library.

In the embodiment, the updating and adding of the algorithm are increased, once the algorithm which is simpler or has higher efficiency is found, the algorithm can be replaced and updated, and the old algorithm is abandoned, so that the space is saved; when a new application program is added, a new algorithm is inevitably brought to adapt to a new task and the like.

Example 3

In this embodiment, algorithm optimization procedures and others are mainly described.

Referring to fig. 2, the algorithm optimization specifically includes the following steps:

1) reading data: reading stored data in the learning plan;

2) sampling: selecting a part of the stored data as sample data;

3) and (3) standardization: standardizing the sample data pieces;

3) the initialization checking method comprises the following steps: carrying out initialization inspection by utilizing a standardized sample according to a selected scheduling strategy;

5) and (3) cross checking: initializing the inspected sample, utilizing cross inspection, inspecting again, and retaining the sample data passing the inspection.

6) Training a model: selecting a part of the stored data as a training set, and training in a training model;

7) and (3) final model: and selecting a training model conforming to the feature label as a final model according to the feature label.

In the embodiment, in the optimization step, the optimal final model is further selected as the scheduling strategy through model optimization, and compared with the previous model, the obtained model has a better effect, a complex algorithm selection process is avoided, and a high-quality algorithm is obtained.

Specifically, the algorithm optimizes, and the feature labels comprise the most common label, the nearest neighbor label and a misclassification rate label.

In this embodiment, select for use a plurality of feature tags, and then can make the model after optimizing as far as possible, the precision relative ratio is higher, and the angle of feature tag is different, can maximize avoid some errors, improve the precision.

Further, in the algorithm optimization, the task execution specifically includes: and according to the optimized scheduling strategy, searching a corresponding slave node, starting an actuator, establishing connection between the actuator and a local driver, and distributing and scheduling tasks through the actuator until all tasks are completed.

In this embodiment, the executor is connected to the local driver, so that the executor that is not in management contact with the client and the master node runs on the slave node in the cluster through the driver, has a certain association with the master node, and is managed by the master node during subsequent execution.

When outputting, the task outputting step specifically comprises: and (3) performing algorithm optimization on the executed task results and then outputting the task results, and meanwhile, performing parameter optimization in input and/or output according to a scheduling strategy of the algorithm optimization step.

In the embodiment, when the task is output, the output structure can be optimized, the result accuracy is higher, and meanwhile, the optimization in plan execution is combined to perform the preferred selection of parameters, eliminate the less suitable parameters, update the more preferred and more useful parameters, and lay a foundation for the re-execution of the subsequent task.

Example 4

This embodiment will be described with specific application as an example.

According to the requirements and characteristics of the power industry for data analysis, the invention provides an industry-oriented distributed algorithm packaging framework. The framework has the characteristics of portability, intelligence, capacity expansion, support of parallelization and the like. The main contents involved are as follows: firstly, a distributed parallel computing framework is realized based on a master-slave mode and real-time memory computing, and basic support is provided for mass data computing; second, the declarative machine learning task is converted into a refined learning plan through a design optimizer. In the process, the optimizer tries to quickly return a high-quality algorithm to the user in the background, so that the user avoids a complicated algorithm selection process. The invention shows the strong potential of the optimizer on the basis of the distributed architecture, and provides better help for non-professional developers to analyze data

The executor runs on the client side and is not managed and restricted by the master node. The executor is different and runs on the slave nodes in the cluster, and thus is managed and constrained by the master node. The flow of a distributed application in a cluster is as follows:

(1) the user submits the application program to the client and the local machine starts the driver.

(2) The client submits the application, the number of required actuators and corresponding resources to the cluster.

(3) The master node receives the request, searches the slave node and starts the actuator according to a proper scheduling strategy.

(4) The actuator establishes a connection with the driver.

(5) The executor allocates and schedules tasks.

(6) And after all the tasks are finished, the application program exits.

In this embodiment, the executor runs on the client, and therefore is not managed and constrained by the master node. Distributed technology allows the executor to run on the slave nodes in the cluster, and thus is managed and restricted by the master node.

In this embodiment, mainly in the cluster manager, the user selects the corresponding executor by using the callback window and the user code through the cluster manager.

In the invention, the general flow of the distributed algorithm encapsulation is as follows: firstly, performing distributed parallel computation, namely a master-slave node matching mode; secondly, algorithm optimization, namely, optimizing each plan by using an optimizer in a planning stage to reduce the calculation amount; thirdly, sampling data, wherein the process is mainly to retrieve the existing data from the memory and the existing real-time data; finally, parameter optimization is carried out, and the whole algorithm, parameters and the like are optimized; and finally, repeatedly returning to the distributed parallel computation to form a closed loop circulating structure so as to realize repeated utilization and updating for many times.

The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

13页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：自动化构建及发布的小程序工程化方法

Algorithm packaging method based on distributed technology

相关技术

网友询问留言