Parallel execution method and system for streaming document analysis

文档序号:763055 发布日期:2021-04-06 浏览:11次 中文

阅读说明:本技术 一种流式文档解析的并行执行方法及系统 (Parallel execution method and system for streaming document analysis ) 是由 潘飚 殷博 袁春峰 于 2020-12-28 设计创作,主要内容包括:本发明公开了一种流式文档解析的并行执行方法及系统,该方法包括以下步骤:S1、接收文档并开始流式解析业务;S2、采用预设方法创建工作线程组;S3、通过获取数据包,查找协程上下文,实现工作线程到指定协程的切换;S4、切换到指定协程后,对报文进行解析;S5、完成对报文的处理,采用预设规则实现指定协程到工作线程的切换;S6、文档处理结束,对指定协程进行释放。有益效果:本发明将协程技术引入到流式解析中,由于协程的切换会自动恢复函数的调用堆栈,所以不需要复杂的上下文切换。而协程的切换在用户态完成,可以非常高效。同时可以根据需要确定执行协程的线程数量,所以可以充分利用系统的多核cpu资源。(The invention discloses a parallel execution method and a system for analyzing a streaming document, wherein the method comprises the following steps: s1, receiving the document and starting the stream analysis service; s2, creating a work thread group by adopting a preset method; s3, by acquiring the data packet, searching the coroutine context, and realizing the switching from the working thread to the appointed coroutine; s4, after switching to the appointed coordination, analyzing the message; s5, completing the processing of the message, and adopting a preset rule to realize the switching from the appointed coroutine to the working thread; and S6, ending the document processing and releasing the appointed coroutine. Has the advantages that: the invention introduces coroutine technology into the stream analysis, and complex context switching is not needed because the coroutine switching can automatically recover the call stack of the function. And the coroutine switching is completed in a user mode, so that the efficiency can be very high. Meanwhile, the number of threads for executing coroutines can be determined according to needs, so that multi-core CPU resources of the system can be fully utilized.)

1. A method for parallel execution of streaming document parsing, the method comprising the steps of:

s1, receiving the document and starting the stream analysis service;

s2, creating a work thread group by adopting a preset method;

s3, by acquiring the data packet, searching the coroutine context, and realizing the switching from the working thread to the appointed coroutine;

s4, after switching to the appointed coordination, analyzing the message;

s5, completing the processing of the message, and adopting a preset rule to realize the switching from the appointed coroutine to the working thread;

and S6, ending the document processing and releasing the appointed coroutine.

2. The method of claim 1, wherein the predetermined method is to configure the number of thread groups according to factors including but not limited to hardware configuration or network traffic.

3. The method of claim 1, wherein the step of completing the switching of the working thread to the designated coroutine by obtaining the data packet, searching the coroutine context, and obtaining the coroutine context further comprises the steps of:

s31, the main thread reads the data packet from the network equipment, and when a data packet is obtained, the data packet is distributed to the designated working thread according to a certain rule;

s32, the worker thread obtains the data packet from the queue and compares the data packet with the document content, when the document content exists, the coroutine context of the document is searched and processed according to the connection characteristic and the document characteristic, and when the document is a new document, a new coroutine is created by the system;

s33, finding the appointed coroutine through searching or creating, and switching into the appointed coroutine.

4. The method of claim 3, wherein the certain rules include, but are not limited to, a quintuple rule, a source IP address, and a destination address.

5. The method of claim 3, wherein creating a new coroutine with the system further comprises:

s321, distributing stack resources for the new coroutine;

and S322, creating a new coroutine context.

6. The method of claim 3, wherein the finding the specified protocol by searching or creating and switching into the specified protocol further comprises:

s331, saving the context of the current coroutine;

s332, modifying the current protocol identification;

s333, switching a stack pointer;

and S334, calling an exchange context function of the operating system to switch the user context.

7. The method of claim 1, wherein the preset rule comprises the following steps:

s51, the current appointed protocol hands out the execution right;

s52, switching out the current protocol and returning to the state before entering the specified protocol;

and S53, waiting for the next scheduling.

8. The method of claim 1, wherein the step of releasing the specified coroutine after the document processing is finished further comprises the steps of:

s61, releasing resources of the appointed coroutine;

and S62, removing the appointed coroutine from the coroutine list.

9. The method of claim 8, wherein the resources include, but are not limited to, stack and context.

10. A parallel execution system for streaming document parsing, for performing steps of a parallel execution method for streaming document parsing as claimed in any one of claims 1-9, the system comprising:

the starting module is used for receiving the document and starting the stream type analysis service;

the working thread group module is used for establishing a working thread group by adopting a preset method;

the coroutine switching module is used for searching coroutine context by acquiring a data packet and realizing the switching from a working thread to a specified coroutine;

the message analysis module is used for analyzing the message after switching to the specified coordination process;

the working thread switching module is used for processing the message and realizing the switching from the designated coroutine to the working thread by adopting a preset rule;

and the coroutine releasing module is used for releasing the appointed coroutine after the document processing is finished.

Technical Field

The invention relates to the technical field of document parsing, in particular to a parallel execution method and system for streaming document parsing.

Background

In network security technology, traffic restoration is a basic technical means. In order to quickly and real-timely discover behaviors such as attacks, intrusions, and divulgences in network traffic, stream analysis of files in the network traffic becomes a necessary means: i.e. the parsing of the document is started when only part of the content of the document is captured.

Although streaming document parsing can timely discover document content transmitted in a network, it also faces several difficulties, one of which is: how to solve the concurrency problem, documents transmitted simultaneously in the network may be hundreds to thousands, and which document the next incoming message belongs to is uncertain, and how to simply and efficiently process the documents becomes a problem which must be faced.

Currently, there are two main solutions:

1. multithreading: and a parsing thread is created for each document to process, and the other distribution thread is responsible for collecting the data packets and distributing the received data packets to the correct parsing thread. For a parse thread, work only in the context of one file. The scheme has the advantages of simple implementation, obvious defects, more resources and unsuitability for large-flow network environment.

2. Single-thread mode: only one thread is responsible for analyzing the documents, the thread records the context information of all the documents, when a data packet is received, the corresponding context is found, the context environment is recovered, and the subsequent processing is carried out. This approach has the advantage of saving resources, but the programming is relatively complex and the resources of the system are not fully utilized.

An effective solution to the problems in the related art has not been proposed yet.

Disclosure of Invention

The invention provides a parallel execution method and a parallel execution system for stream document parsing, aiming at the problems in the related art, so as to overcome the technical problems in the prior related art.

Therefore, the invention adopts the following specific technical scheme:

according to an aspect of the present invention, there is provided a parallel execution method of streaming document parsing, the method comprising the steps of:

s1, receiving the document and starting the stream analysis service;

s2, creating a work thread group by adopting a preset method;

s3, by acquiring the data packet, searching the coroutine context, and realizing the switching from the working thread to the appointed coroutine;

s4, after switching to the appointed coordination, analyzing the message;

s5, completing the processing of the message, and adopting a preset rule to realize the switching from the appointed coroutine to the working thread;

and S6, ending the document processing and releasing the appointed coroutine.

Further, the preset method is to configure the number of thread groups according to factors including but not limited to hardware configuration or network traffic.

Further, the step of completing the switching from the working thread to the designated coroutine by acquiring the data packet and searching the coroutine context further comprises the following steps:

s31, the main thread reads the data packet from the network equipment, and when a data packet is obtained, the data packet is distributed to the designated working thread according to a certain rule;

s32, the worker thread obtains the data packet from the queue and compares the data packet with the document content, when the document content exists, the coroutine context of the document is searched and processed according to the connection characteristic and the document characteristic, and when the document is a new document, a new coroutine is created by the system;

s33, finding the appointed coroutine through searching or creating, and switching into the appointed coroutine.

Further, the certain rules include, but are not limited to, a five-tuple rule, a source IP address, and a destination address.

Further, the creating a new coroutine by the system further comprises the following steps:

s321, distributing stack resources for the new coroutine;

and S322, creating a new coroutine context.

Further, the finding of the designated coroutine through searching or creating and switching into the designated coroutine further comprises the following steps:

s331, saving the context of the current coroutine;

s332, modifying the current protocol identification;

s333, switching a stack pointer;

s334, calling swapcontext function of the operating system to switch user context.

Further, the preset rule comprises the following steps:

s51, the current appointed protocol hands out the execution right;

s52, switching out the current protocol and returning to the state before entering the specified protocol;

and S53, waiting for the next scheduling.

Further, the step of releasing the specified coroutine after the document processing is finished further comprises the following steps:

s61, releasing resources of the appointed coroutine;

and S62, removing the appointed coroutine from the coroutine list.

Further, the resources include, but are not limited to, stacks and contexts.

According to another aspect of the present invention, there is also provided a parallel execution system for streaming document parsing, the system including:

the starting module is used for receiving the document and starting the stream type analysis service;

the working thread group module is used for establishing a working thread group by adopting a preset method;

the coroutine switching module is used for searching coroutine context by acquiring a data packet and realizing the switching from a working thread to a specified coroutine;

the message analysis module is used for analyzing the message after switching to the specified coordination process;

the working thread switching module is used for processing the message and realizing the switching from the designated coroutine to the working thread by adopting a preset rule;

and the coroutine releasing module is used for releasing the appointed coroutine after the document processing is finished.

The invention has the beneficial effects that:

1. the invention introduces coroutine technology into streaming parsing, coroutine being a lightweight, user-mode, cooperative component that allows execution, suspension, and resumption, is more generic and flexible than subroutines, and is lighter-weight than threads. And complex context switching is not needed because coroutine switching automatically restores the call stack of the function. And the coroutine switching is completed in a user mode, so that the efficiency can be very high. Meanwhile, the number of threads for executing coroutines can be determined according to needs, so that multi-core CPU resources of the system can be fully utilized. The method solves the problem of switching the context of the document in the process of analyzing the streaming document, so that the process of analyzing a plurality of documents simultaneously is similar to the process of independently analyzing one document, thereby greatly simplifying the programming complexity in the process of parallel processing of the plurality of documents.

2. The invention adopts a combination mode of multithreading and multi-coordinated program, thereby not only avoiding the problem of low resource utilization rate in a single-thread mode, but also avoiding the problems of excessive thread quantity and excessive thread switching expense in a multi-thread mode.

3. Because the creation, switching and destruction of the coroutines are all completed in a user mode and do not involve time-consuming system call, the quantity of the coroutines can reach tens of thousands or more, and no additional performance loss is brought. Meanwhile, the application layer is imperceptible due to the switching of the coroutines, so that the application layer looks like independently processing a document without complex operations such as context switching, recovery and the like, and the programming complexity is greatly reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a flow diagram of a method for parallel execution of streaming document parsing in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of a structure of co-programmed streaming document parsing in a parallel execution method of streaming document parsing according to an embodiment of the present invention;

FIG. 3 is a system block diagram of a parallel execution system for streaming document parsing in accordance with an embodiment of the present invention.

In the figure:

1. a starting module; 2. a work thread group module; 3. a coroutine switching module; 4. a message parsing module; 5. a working thread switching module; 6. and a coroutine releasing module.

Detailed Description

For further explanation of the various embodiments, the drawings which form a part of the disclosure and which are incorporated in and constitute a part of this specification, illustrate embodiments and, together with the description, serve to explain the principles of operation of the embodiments, and to enable others of ordinary skill in the art to understand the various embodiments and advantages of the invention, and, by reference to these figures, reference is made to the accompanying drawings, which are not to scale and wherein like reference numerals generally refer to like elements.

According to the embodiment of the invention, a parallel execution method and a system for streaming document parsing are provided.

The present invention will be further described with reference to the accompanying drawings and detailed description, wherein as shown in fig. 1-2, a method for parallel execution of streaming document parsing according to an embodiment of the present invention includes the following steps:

s1, receiving the document and starting the stream analysis service;

s2, creating a work thread group by adopting a preset method;

these thread groups are responsible for the real execution of the parsing function, and once created, the threads are not adjusted. The way of creating the work thread group may use a thread creation method provided by an operating system (e.g., pthread _ create method under linux, CreateThread method under windows platform). These groups of worker threads execute the same logic: circularly obtain data packets from the message queue of the user and schedule a proper coroutine to complete the analysis of the data packets,

s3, by acquiring the data packet, searching the coroutine context, and realizing the switching from the working thread to the appointed coroutine;

the work process that the worker thread obtains the data packet from the queue is similar to the traditional packet analysis process, and the difference is mainly as follows: after the data packet is acquired, the data packet is not directly analyzed, but is distributed to the message opponents of the working thread.

S4, after switching to the appointed coordination, analyzing the message;

s5, completing the processing of the message, and adopting a preset rule to realize the switching from the appointed coroutine to the working thread;

and S6, ending the document processing and releasing the appointed coroutine.

In one embodiment, the preset method is to configure the number of thread groups according to factors including but not limited to hardware configuration or network traffic

In one embodiment, the step of completing the switching from the working thread to the designated coroutine by obtaining the data packet and searching the coroutine context further comprises the following steps:

s31, the main thread reads the data packet from the network equipment, and when a data packet is obtained, the data packet is distributed to the designated working thread according to a certain rule;

s32, the worker thread obtains the data packet from the queue and compares the data packet with the document content, when the document content exists, the coroutine context of the document is searched and processed according to the connection characteristic and the document characteristic, and when the document is a new document, a new coroutine is created by the system;

s33, finding the appointed coroutine through searching or creating, and switching into the appointed coroutine.

In one embodiment, the certain rules include, but are not limited to, a five-tuple rule, a source IP address, and a destination address.

In one embodiment, the creating a new coroutine with the system further comprises the steps of:

s321, distributing stack resources for the new coroutine;

and S322, creating a new coroutine context.

In one embodiment, the finding the designated coroutine through searching or creating and switching into the designated coroutine further comprises the following steps:

s331, saving the context of the current coroutine;

s332, modifying the current protocol identification;

s333, switching a stack pointer;

s334, calling swapcontext function of the operating system to switch user context.

In one embodiment, the preset rule comprises the steps of:

s51, the current appointed protocol hands out the execution right;

s52, switching out the current protocol and returning to the state before entering the specified protocol;

and S53, waiting for the next scheduling.

In one embodiment, the document processing is finished, and the releasing the specified coroutine further comprises the following steps:

s61, releasing resources of the appointed coroutine;

and S62, removing the appointed coroutine from the coroutine list.

In one embodiment, the resources include, but are not limited to, stacks and contexts.

According to another embodiment of the present invention, as shown in the figure, there is also provided a parallel execution system for streaming document parsing, including:

the starting module 1 is used for receiving the document and starting the stream type analysis service;

a working thread group module 5, configured to create a working thread group by using a preset method;

the coroutine switching module 3 is used for searching coroutine context by acquiring a data packet and realizing the switching from a working thread to a specified coroutine;

the message analysis module 4 is used for analyzing the message after switching to the designated protocol;

the working thread switching module 5 is used for processing the message and realizing the switching from the designated coroutine to the working thread by adopting a preset rule;

and the coroutine releasing module 6 is used for releasing the appointed coroutine after the document processing is finished.

In summary, with the above technical solution of the present invention, the coroutine technology is introduced into the streaming analysis, and since the coroutine switching automatically restores the call stack of the function, no complex context switching is required. And the coroutine switching is completed in a user mode, so that the efficiency can be very high. Meanwhile, the number of threads for executing coroutines can be determined according to needs, so that multi-core CPU resources of the system can be fully utilized. The method solves the problem of switching the context of the document in the process of analyzing the streaming document, so that the process of analyzing a plurality of documents simultaneously is similar to the process of independently analyzing one document, thereby greatly simplifying the programming complexity in the process of parallel processing of the plurality of documents. The invention adopts a combination mode of multithreading and multi-coordinated program, thereby not only avoiding the problem of low resource utilization rate in a single-thread mode, but also avoiding the problems of excessive thread quantity and excessive thread switching expense in a multi-thread mode. Because the creation, switching and destruction of the coroutines are all completed in a user mode and do not involve time-consuming system call, the quantity of the coroutines can reach tens of thousands or more, and no additional performance loss is brought. Meanwhile, the application layer is imperceptible due to the switching of the coroutines, so that the application layer looks like independently processing a document without complex operations such as context switching, recovery and the like, and the programming complexity is greatly reduced.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:PDF文档转成DXF文档的方法、系统及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!