Smart power grid metering processing method based on distributed computation

文档序号：632248 发布日期：2021-05-11 浏览：18次中文

阅读说明：本技术 一种基于分布式计算的智能电网计量处理方法 (Smart power grid metering processing method based on distributed computation ) 是由刘剑清陈伟峰谢志伟白金李艺于 2021-01-13 设计创作，主要内容包括：本发明提供一种基于分布式计算的智能电网计量处理方法,包括：获取电网计量数据、采用基于HDFS的分布式运行数据存储方式对所述电网计量数据进行存储、采用分布式计算框架Spark的计量数据抽取快速模式、采用分布式计算框架Spark的迭代计算方式,将抽取出来的所述电网计量数据转化成为RDD抽象数据,再对所述RDD抽象数据进行迭代计算、采用Airflow的数据计算调度方法,对迭代计算后的数据进行调度计算,将调度计算后的数据存储到HDFS分布式文件系统中。本案对智能电网计量数据的处理上在数据的传输过程中保证数据的完整性、准确性、同步性,在处理过程中缩短访问时间、提高数据处理效率,有助于电力信息化的发展。(The invention provides a smart grid metering processing method based on distributed computing, which comprises the following steps: the method comprises the steps of obtaining power grid metering data, storing the power grid metering data by adopting a distributed operation data storage mode based on HDFS, extracting a rapid mode of the metering data by adopting a distributed computing frame Spark, converting the extracted power grid metering data into RDD abstract data by adopting an iterative computing mode of the distributed computing frame Spark, then performing iterative computing on the RDD abstract data, scheduling and computing the data after the iterative computing by adopting an Airflow data computing and scheduling method, and storing the data after the scheduling and computing into an HDFS distributed file system. The scheme guarantees the integrity, accuracy and synchronism of data in the transmission process of the data in the process of processing the smart grid metering data, shortens the access time in the processing process, improves the data processing efficiency, and is beneficial to the development of electric power informatization.)

1. A smart grid metering processing method based on distributed computing is characterized by comprising the following steps:

acquiring power grid metering data;

storing the power grid metering data by adopting a distributed operation data storage mode based on an HDFS (Hadoop distributed File System);

a rapid measuring data extraction mode of a distributed computing framework Spark is adopted;

and adopting an Airflow data calculation scheduling method to perform scheduling calculation on the data after iterative calculation, and storing the data after scheduling calculation into an HDFS distributed file system.

2. The smart grid metering processing method according to claim 1, wherein:

the step of storing the power grid metering data by adopting a distributed operation data storage mode based on the HDFS comprises the following steps of:

constructing an HDFS distributed file storage system, wherein the HDFS distributed file storage system comprises NameNode nodes and a plurality of DataNode nodes;

and carrying out storage read-write operation on the power grid metering data between the NameNode node and the plurality of DataNode nodes by utilizing an MOBUS communication protocol.

3. The smart grid metering processing method according to claim 1, wherein:

the step of performing iterative computation on the RDD abstract data by adopting an iterative computation mode of a distributed computation framework Spark comprises the following steps:

forming a directed acyclic graph DAG through a dispatcher interface DAG scheduler of the distributed computing framework Spark according to the abstract data RDD, and decomposing the directed acyclic graph DAG into a plurality of dispatching stages;

and decomposing the scheduling phase into a task set through a DAGSSchedule of a scheduler interface of the distributed computing framework Spark, and submitting the task set to a work node for operation.

Technical Field

The invention relates to the technical field of electric power informatization, in particular to a smart grid metering processing method based on distributed computing.

Background

At the present stage, the smart grid technology is rapidly developed, and the smart metering analysis technology is also applied to metering data generated in the smart grid, but the development of the smart grid in China is developing from the traditional metering method to a high-tech metering mode. The current metering technology in the smart grid has shortcomings.

The intelligent degree of the acquisition equipment of the smart grid metering data is low, the traditional electromagnetic mutual inductor metering mode is not completely replaced by a serial port transmission metering result, and the updating of new and old metering equipment is not perfect, so that the processing of the acquired metering data is not perfect; meanwhile, in the automatic filing process of the metering data, the defects of information loss, inaccurate data, asynchronous data and untimely transmission exist in the information filing process.

When the archived metering data is processed, the existing method has the defects that the data cannot be processed in batches, the time for accessing the archived data is too long and the like when massive data is processed, the processing efficiency of massive smart grid data is low, and the requirement for electric power informatization at the present stage cannot be met. And a large amount of metering data cannot be effectively utilized after being collected at the present stage.

In addition, a large number of old-fashioned ammeters are used in old communities in China, how to incorporate the traditional ammeter into the smart grid through low cost and high efficiency is significant, the old-fashioned ammeters are incorporated into the smart grid at low cost, the input of meter reading personnel is reduced, and the intelligent networking of the power grid is accelerated.

Disclosure of Invention

The invention aims to provide a smart grid metering processing method based on distributed computing.

In order to achieve the purpose of the invention, the invention provides a smart grid metering processing method based on distributed computing, which comprises the following steps:

acquiring power grid metering data;

the method comprises the steps that a distributed operation data storage mode based on an HDFS (Hadoop distributed File System) is adopted to store power grid metering data;

a rapid measuring data extraction mode of a distributed computing framework Spark is adopted;

converting the extracted power grid metering data into RDD abstract data by adopting an iterative computation mode of a distributed computation framework Spark, and then performing iterative computation on the RDD abstract data; and adopting an Airflow data calculation scheduling method to perform scheduling calculation on the data after iterative calculation, and storing the data after scheduling calculation into an HDFS distributed file system.

The method further adopts a distributed operation data storage mode based on the HDFS to store the power grid metering data, and comprises the following steps:

constructing an HDFS distributed file storage system, wherein the HDFS distributed file storage system comprises NameNode nodes and a plurality of DataNode nodes;

and carrying out storage read-write operation on the power grid metering data between the NameNode node and the plurality of DataNode nodes by utilizing an MOBUS communication protocol.

Further, the step of performing iterative computation on the RDD abstract data by using an iterative computation mode of a distributed computation framework Spark includes:

forming a directed acyclic graph DAG through a dispatcher interface DAGScheduler of a distributed computing framework Spark according to the abstract data RDD, and decomposing the directed acyclic graph DAG into a plurality of dispatching stages;

and decomposing the scheduling phase into a task set through a DAGSSchedule of a scheduler interface of a distributed computing framework Spark, and submitting the task set to a work node for operation.

The invention has the advantages that based on the massive power grid metering data of the smart power grid, a distributed operation data storage mode based on HDFS is adopted to store the metering data, then a metering data extraction rapid mode and an iterative computation mode of a distributed computation framework Spark are adopted to effectively shorten the access time and improve the data processing efficiency, and the processed data are transmitted to Airflow to carry out data computation scheduling, so that the batch processing of the data is realized, the access time of the data is shortened, and the access efficiency and the processing efficiency are improved.

Drawings

Fig. 1 is a system block diagram of an embodiment of an intelligent cloud acquisition system of the present invention.

Fig. 2 is a front view of a power acquisition device in an embodiment of the smart cloud acquisition system of the present invention.

Fig. 3 is a side view of a power collection device in an embodiment of the smart cloud collection system of the present invention.

Fig. 4 is a flowchart of an embodiment of a smart grid metering processing method according to the present invention.

FIG. 5 is a diagram of an HDFS architecture in an embodiment of a smart grid metering processing method of the invention

FIG. 6 is a schematic diagram of an Hbase processing mechanism in the embodiment of the smart grid metering processing method.

Fig. 7 is a time chart of a data list of the HDFS system in the embodiment of the smart grid metering processing method of the present invention.

Fig. 8 is a Spark operation framework in an embodiment of the smart grid metering processing method of the present invention.

Fig. 9 is an Airflow basic framework diagram in the embodiment of the smart grid metering processing method of the present invention.

Fig. 10 is a flowchart of the data calculation scheduling step of the Airflow in the embodiment of the smart grid metering processing method of the present invention.

The invention is further explained with reference to the drawings and the embodiments.

Detailed Description

Referring to fig. 1 to 3, the intelligent cloud collection system includes a plurality of power collection devices and a server group, the plurality of power collection devices are dispersedly disposed in residential buildings or commercial buildings in various regions, and the server group includes a plurality of servers, a plurality of inboard storage device clusters and distributed file servers, which are connected through an MOBUS communication protocol.

The electric quantity acquisition device 1 comprises an electric meter box, a horizontal movement assembly, a vertical movement assembly and an acquisition assembly 23, wherein the electric meter box comprises a box body 11 and a plurality of electric meters 12 arranged in the box body 11, and the plurality of electric meters 12 are arranged in the box body 11 in a matrix manner. The horizontal moving assembly comprises two horizontal supporting frames 211, a horizontal driving motor 212, a first screw rod 213, a first guide rod 214 and a horizontal moving sliding block 215, wherein the two horizontal supporting frames 211 are respectively arranged at the two horizontal ends of the top of the box body 11, the first guide rod 214 is fixedly arranged between the two horizontal supporting frames 211 along the horizontal direction, the first screw rod 213 is horizontally arranged and rotatably arranged on the horizontal supporting frames 211, the first screw rod 213 is connected with the horizontal supporting frames 211 through a bearing, the horizontal driving motor 212 is arranged at the axial end of the first screw rod 213 and is in driving connection with the axial end of the first screw rod 213, the horizontal driving motor 212 drives the first screw rod 213 to rotate around the axis thereof, the horizontal moving sliding block 215 is provided with a first driving screw hole and a first positioning hole in a penetrating way, the first screw rod 213 penetrates through the first driving screw hole and is meshed with the first driving screw hole, the first guide rod 214 is horizontally arranged and penetrates through the first positioning hole, so that the rotation of the first screw rod 213 and the horizontal moving slide 215 are driven to move in the horizontal direction by the rotation driving of the horizontal driving motor 212.

The vertical moving assembly includes a vertical driving motor 223 and a second screw 221, a second guide rod 222 and a vertical moving slider 224, the second screw 221 is arranged in the vertical direction and rotatably disposed on the horizontal moving slider 215, the vertical driving motor 223 is connected with an axial end of the second screw 221 and drives the second screw 221 to rotate around the axis thereof, the second screw 221 is connected with the horizontal moving slider 215 through a bearing, the vertical moving slider 224 is penetratingly provided with a second driving screw hole and a second positioning hole, the second screw 221 passes through the second driving screw hole and is engaged with the second driving screw hole, the second guide rod 222 is arranged in the vertical direction and passes through the second positioning hole, a positioning frame 237 is disposed at an axial end of the second screw 221 relative to the vertical driving motor 223, the second screw 221 is rotatably disposed on the positioning frame 237, and the second guide rod 222 is fixedly connected between the horizontal moving slider 215 and the positioning frame 237. Under the rotation driving action of the vertical driving motor 223, the second screw 221 is driven to rotate, and then the vertical moving slider 224 is driven to move in the vertical direction.

Gather subassembly 23 and include casing 231, camera 232 and main control circuit board 235, camera 232, main control circuit board 235 all set up on casing 231, and casing 231 sets up on vertical removal slider 224, and camera 232 and power module are connected with main control circuit board 235 respectively, are provided with convex lens 233 on casing 231, and a plurality of LED lamps 234 are located the periphery of convex lens 233, and camera 232 is relative with convex lens 233.

Main control circuit board 235 passes through data line 236 and data line 216 respectively with vertical drive motor 223, horizontal drive motor 212 is connected, data line 236 and data line 216 are provided with the tow chain respectively, can follow horizontal direction and vertical direction through drive collection component 23 and remove, LED lamp 234 is towards ammeter 12 light filling illumination, camera 232 sets up towards ammeter 12, camera 232 sees through convex lens 233 and can shoot the reading of ammeter 12, then acquire original electric wire netting measurement data, because be provided with the remote communication module on the main control circuit board 235, then main control circuit board 235 can be connected with a plurality of servers through the remote communication module.

Referring to fig. 4, the smart grid metering processing method based on distributed computing includes, first, executing step S1, collecting original metering data through the smart grid metering device of the power collection device 1, recognizing the shot original grid metering image data through image recognition, and then obtaining the grid metering data.

And then executing a step S2, storing the power grid metering data by adopting a distributed operation data storage mode based on the HDFS, wherein the Hadoop realizes a distributed File System HDFS based on a Google File System model provided by Google. The HDFS is generally deployed on a plurality of common computer hosts, so that if only one or a few nodes in a cluster fail, the whole system can still work normally, and the architecture of the distributed file system is normally exerted. Meanwhile, the method has higher data access energy and is very suitable for processing the current mass data. HDFS provides an operational interface and presentation similar to a general file system, and users can perform file-related operations such as renaming, moving, deleting, and creating.

Referring to fig. 5, the HDFS distributed file system adopts a master-slave structure distribution, a unique NameNode is run on a host in the system, each node can run a data node, and the NameNode and a plurality of datanodes form a complete system. The NameNode is used as a unique host, has decision-making property for naming of the system, and analyzes access operation of a client to a file, such as operations of adding, deleting the file, reading the file and the like. The DataNode manages data. Data is stored in the form of blocks (blocks) in the DataNode, and the blocks can form a file or a directory through records in the NameNode, and the most critical NameNode establishes a backup node second name node (SecondaryNameNode) to prevent the NameNode from generating faults in the operation process.

The NameNode is responsible for storing metadata of all data of the HDFS, including file information, a file corresponding to each file, and information of file blocks in the DataNode. When the client operates the system, only the metadata on the NameNode needs to be acquired or updated, and the real data operation is interacted with the corresponding DataNode, so that the workload and the network pressure of the NameNode are greatly reduced.

In the HDFS architecture, reliable communication is carried out among NameNode, DataNode and client by using TCP/IP, and the inside of a program is realized by using remote procedure call.

In the process of reading operation, the client requests the meta information of the corresponding file from the NameNode and then reads the corresponding block in the corresponding DataNode;

in the process of read-write operation, when a client uses an MOBUS communication protocol to store and read-write operation on the power grid metering data between the NameNode and the plurality of DataNodes, the client caches the data to be written in, puts the data into a cache folder, sends write-in file information to the NameNode when the cached data reaches a pre-configured block size, adds corresponding metadata to the NameNode, feeds back the position where the DataNode needs to store new data, namely the DataNode information and module information, and then writes the data into the position specified by the NameNode by the client.

The HDFS guarantees the reliability and availability of the system by adopting a redundancy strategy. The files in the HDFS can store a plurality of copies according to needs, generally, the number of the copies is three, the three copies are distributed on different nodes respectively, and from near to far, the three copies are a node, another node in the same rack and a certain node in another different rack.

Referring to fig. 6, Hbase is a distributed and column-oriented distributed database, which is a preferred tool for storing data in the current large data field, and is built on the HDFS file system of Hadoop, and Hbase is a Hadoop database, which is often described as a sparse, distributed, persistent, multidimensional ordered mapping, which establishes an index relationship based on row keys, column keys, and timestamps, and is a storage and retrieval data platform that can be randomly accessed. Hbase does not limit the kind of data stored, and does not emphasize the relationship between data, Hbase is designed to run on one server cluster, and can be extended horizontally accordingly. The method is widely applied to the fields of internet search, information interaction, incremental data capture and the like, wherein the incremental data capture function can achieve the purposes of capturing monitoring indexes, capturing user interaction data, telemetering technology, targeted advertisement delivery and the like.

The HDFS system data list submodel combines the interface class and the corresponding implementation class into one when designing the class diagram because of too many classes related to the data processing module, names the interface class by name, and represents a class object, and a timing diagram of the HDFS system data list is shown in fig. 7.

(1) Getdirectoryffromthdffs (): traversing files and directories on the HDFS, and acquiring detailed information HDFSList of the files and the directories, wherein the detailed information HDFSList comprises information such as the belonged users, the file sizes and the creation dates;

(2) refresh (): reloading the HDFS file list;

(3) CreateFile (dst, contents): creating a new file, dst represents a path, and contents are contents;

(4) mkdir (path): creating a new directory, wherein path is the path of the directory;

(5) deletefile (filename): deleting the file according to the file path;

(6) upload (src, dst): uploading a local file to an HDFS (Hadoop distributed file system), wherein src represents a local path where the file is located, and dst represents a target path on the HDFS;

(7) download (src, dst): downloading files on the HDFS to a local, wherein src represents a path where the files on the HDFS are located, and dst represents a local target path;

(8) renamefile (fromfileName, tofileName): renaming the file, wherein fromfileNAme represents the file name before modification, and tofileNAme represents the file name after modification;

(9) readfile (filename): displaying the file content according to the file path, wherein the filepath represents the file path;

(10) iffileexits (filename): and checking whether a certain file on the HDFS system exists according to the file name, wherein the fileName represents the file name.

The processing process of the metering data in the system on the HDFS system can be obtained through the steps, and the metering data collected by the smart grid metering equipment is further processed by a Spark distributed computing framework, so that the result calculated on Spark is more efficient.

Subsequently, step S3 is executed, and a fast mode of metering data extraction by using the distributed computing framework Spark is adopted, and in the operation of Spark, there are three main modes: spark on yarn mode, standby mode, Spark on meso mode. The standby mode is a resource management mode of the Spark cluster, and can be independently deployed in the Spark cluster to realize a resource management and fault tolerance mechanism; the Spark on grow model is to manage resources using grow, with each executor running inside a grow container; the Spark on mess model utilizes mess for resource management. The two deployment modes, namely Spark on yan and Spark on facilities, use an external resource management component to carry out resource management, and the two modes are developed on the basis of a standby mode, so that the compatibility with a Hadoop system is realized, and the system management of resources is realized.

A cluster with 3 nodes is deployed in a Spark cluster in a standalone mode, the node names are respectively A, B and C, wherein the node A is a master node, and the rest are working nodes. As shown in fig. 8, when a node a submits a Spark task through Spark-submit, the cluster creates a main process on the node a, where the main process is responsible for resource allocation of jobs, generation of DAGs, and task scheduling, and the process is a driver process. And the task is distributed to a worker node by a driver, the node is a task execution place, and a process for executing the job is created and is an executor process.

The components mainly contained in the operation architecture are: (1) a data storage component: spark stores data using the HDFS file system, which can be used to store any Hadoop compatible data source, including HDFS, Hbase, etc. (2) A computer interface component: an application developer can create a Spark-based application by using a standard API (application programming interface), and the Spark application interface can be suitable for programming languages such as Java and Python. (3) A resource manager component: spark can be deployed on a single server or on a distributed computing framework like meso.

And then, executing step 4, converting the extracted power grid metering data into RDD abstract data by adopting an iterative computation mode of a distributed computation framework Spark, and performing iterative computation on the RDD abstract data.

Firstly, intelligent power grid electric energy metering data stored in an HDFS distributed file system are loaded into a distributed computing framework Spark for data extraction, and filtered data are subjected to iterative computation.

And then, abstracting and converting the loaded metering data into RDD abstract data in the driver, registering information to the task manager, and applying for allocation of computing resources.

Then, the abstract data RDD and the edge between the abstract data RDD are combined into a directed acyclic graph DAG through a DAGScheduler in the high-level scheduler interface in Spark to be decomposed into different types of scheduling stages (stages).

And next, decomposing different scheduling stages into task sets taskSet through a DAGSScheduler of a high-level scheduler interface in Spark, and submitting the task sets taskSet to the work node for executing operation.

And then, registering the work node in a driver process before the work node is executed, and after the registration is finished and the work node is in a ready state, executing the task.

And then, the Work node feeds back the task execution condition of the node to the task manager in the task execution process, and meanwhile, the task manager is used for monitoring the data processing condition, so that the Work node is fully utilized in the data execution process.

And finally, after the measured data in the distributed file storage system is processed, the task manager eliminates the information registered by the driver for the next calculation.

And then executing step S5, adopting an Airflow data calculation scheduling method to perform scheduling calculation on the data after iterative calculation, and storing the data after scheduling calculation into the HDFS distributed file system. Referring to fig. 9, the roles of the clusters in the Airflow architecture include the following:

webserver: and providing web end services, and periodically generating a sub-process to scan DAGs under the corresponding directory and update the database.

A scheduler: and generating a scheduling service, generating a task according to the DAG, and submitting the task to a message middleware queue (redis or rabbitMq).

Battery worker: distributed on different machines as the real execution nodes of the task. By listening to the message middleware: a redis or a rabbitMq gets a task, a rank is a queuing mechanism, and generally, a rank is composed of two components, namely a Broker and a Result backend, where the Broker is responsible for storing an executed command and the Result backend is responsible for storing state information of the executed command.

airflower: monitoring the survivability of the worker process, starting or closing the worker process, and checking the running task.

An effector: the operation is performed.

Referring to fig. 10, in order to fully utilize the metering data of the smart grid, a data calculation scheduling method based on Airflow is designed, data after Spark iterative computation is scheduled and used, the advantage of Airflow is fully utilized in the process, and the processing efficiency of historical data is improved.

The Airflow-based data calculation scheduling method comprises the following specific working steps:

firstly, starting a task, generating a task list and processing the task list by a distributed computing framework Spark to obtain data after iterative computation.

Then, after starting the task, scheduling a timing service by schedule, generating a definition task by DAG, inputting the definition task and the dependency and schedule scanning timing service into all tasks to be stored in a database, judging whether the timing condition is met, and analyzing a model when the timing condition is met;

then, when metering data after Spark distributed computation framework iteration is successfully exported, data computation scheduling computation is performed downwards through Airflow, meanwhile, computed data are stored in an HDFS distributed file system, data are obtained from the HDFS system, when the original metering data are exported unsuccessfully, a timing task fails, and meanwhile, a task log record is stored into a source database;

therefore, the invention designs a storage method of the metering data based on the HDFS distributed file system, improves the problem of insufficient storage of the existing massive metering data and is based on a distributed computing method, utilizes a metering data extraction rapid mode of a distributed computing frame Spark, strengthens the management of the metering data, improves the integrity of the metering file information, and ensures the synchronism, accuracy and timeliness of the data in the transmission process;

when the archived metering data is processed, the existing method has the defects of incapability of processing the data in batches, overlong access time of the archived data and the like when processing massive data, has low processing efficiency on massive intelligent data, and cannot meet the requirement on electric power informatization at the present stage. Aiming at the problems, the invention utilizes an iterative computation mode of a distributed computation framework Spark to convert the metering data in the archive, and solves the problems that the batch processing of massive metering data cannot be carried out, the time for accessing the metering data is long, and the efficiency is low. The application of the invention can help the development of electric power informatization to a great extent and improve the data processing capacity.

A large amount of metering data cannot be effectively utilized after being collected at the present stage, in order to fully utilize the metering data of the smart grid, a data calculation and scheduling method based on Airflow is designed, the data after Spark iterative computation is scheduled and used, the advantage of the Airflow is fully utilized in the process, and the processing efficiency of historical data is improved.

17页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于MCU提高三相费控智能电表计量精度的方法及系统

Smart power grid metering processing method based on distributed computation

相关技术

网友询问留言