High-performance distributed combined multi-channel video real-time processing method

文档序号：135166 发布日期：2021-10-22 浏览：10次中文

阅读说明：本技术 一种高性能分布式结合的多路视频实时处理方法 (High-performance distributed combined multi-channel video real-time processing method ) 是由刘必振丁皓于 2021-08-31 设计创作，主要内容包括：本发明涉及一种高性能分布式结合的多路视频实时处理方法,属于视频处理技术领域。包括如下步骤：A、构建视频处理pipeline,所述视频处理pipeline包括视频解码模块、视频前处理模块、视频模型推理模块和视频后处理模块；B、开启多进程/线程：每个处理模块开启多进程方式,而每个进程开启多个线程；各模块开启的进程/线程数量,根据业务场景需要、模型性能和软硬件资源限制共同确定；C、构建上下游模块间的共享队列,设置数据存取策略。本发明视频解码、前处理、模型推理和后处理模块全部分离,使用简单,特别设计的相邻模块间数据存取机制可显著提升多路视频处理效率。(The invention relates to a high-performance distributed combined multi-channel video real-time processing method, and belongs to the technical field of video processing. The method comprises the following steps: A. constructing a video processing pipeline, wherein the video processing pipeline comprises a video decoding module, a video pre-processing module, a video model reasoning module and a video post-processing module; B. starting multiple processes/threads: each processing module starts a multi-process mode, and each process starts a plurality of threads; the number of processes/threads started by each module is determined jointly according to the requirements of service scenes, the performance of the model and the limitations of software and hardware resources; C. and constructing a shared queue between an upstream module and a downstream module, and setting a data access strategy. The video decoding, preprocessing, model reasoning and post-processing modules are all separated, the use is simple, and the specially designed data access mechanism between the adjacent modules can obviously improve the multi-channel video processing efficiency.)

1. A high-performance distributed combined multi-channel video real-time processing method is characterized by comprising the following steps:

A. constructing a video processing pipeline, wherein the video processing pipeline comprises a video decoding module, a video pre-processing module, a video model reasoning module and a video post-processing module;

B. starting multiple processes/threads: each processing module starts a multi-process mode, and each process starts a plurality of threads; the number of processes/threads started by each module is determined jointly according to the requirements of service scenes, the performance of the model and the limitations of software and hardware resources;

C. and constructing a shared queue between an upstream module and a downstream module, and setting a data access strategy.

2. The method according to claim 1, wherein the starting multiple processes/threads comprises: set the number of video source paths as(ii) a The decoding module has the process number ofOf 1 atThe number of threads opened by a decoding process is(ii) a The number of the procedures of the pretreatment module isOf 1 atThe number of threads started in each pre-processing process is(ii) a The model reasoning module has the process number ofOf 1 atThe number of threads opened by the model reasoning process is(ii) a The number of processes of the post-processing module isOf 1 atThe number of threads opened by each post-processing process is。

3. The method of claim 1, wherein the method comprises: the step C specifically comprises the following steps:

c1, video source data reading: only considering the situation that each video can be decoded by only one process/thread and one process/thread can decode multiple videos, then；

Order toIf it satisfiesWherein，Then will beDistribution of road video to the secondA first of decoding processesA thread;

c2, data access between upstream and downstream module processes:

c21, shared queue number setting:

memory moduleAnd moduleNumber of queues in betweenThen, it requiresThe method avoids the simultaneous operation of a plurality of processes on one queue, thereby generating system switching overhead;

c22, designing a data access strategy;

c23, model batch reasoning.

4. The method according to claim 3, wherein C22 comprises the following steps: 1)the situation is as follows:

the upstream module processes are bound with the shared queue one by one, the data processed by each upstream module process are stored in the respective proprietary shared queue, and the downstream module processes poll and read the data from the shared queue in a pull mode;

2）the situation is as follows:

3）the situation is as follows:

5. The high-performance distributed multi-channel video real-time processing method according to claim 3, wherein: c23 specifically includes the following operation steps:

1) establishing a temporary list, and setting batch inference data size batch _ size;

2) reading data from the input queue in a non-waiting mode, if the data is successful, saving the data to a temporary list, and performing the step 3), otherwise, jumping to the step 4);

3) judging whether the length of the list is equal to the batch _ size, if so, performing the step 4), and otherwise, repeating the step 2);

4) performing model batch reasoning on the temporary list data;

5) and (5) emptying the temporary list data and repeating the step 2).

6. The method of claim 1, wherein the method comprises: the tools used by the decoding module include FFmpeg, VideoProcessingFramework.

Technical Field

The invention relates to a high-performance distributed combined multi-channel video real-time processing method, and belongs to the technical field of video processing.

Background

In many scenes such as intelligent security, automatic driving and the like, performing multi-dimensional analysis on videos, such as voices, characters, faces, objects and scenes, and automatically extracting some specific events or specific behaviors of monitoring targets, is a basic and very important work. In practical application, the real-time performance of video processing and analysis and the number of concurrent processing paths are two very critical indexes. NVIDIA corporation provides a suite of data stream analysis tools DeepStream and supports GPU hardware acceleration. Developers need not design end-to-end solutions, but rather, focus on building core deep learning networks for video analytics. However, the deep stream tool also has some disadvantages, because each module is tightly bound, it is extremely tedious to dynamically delete/add/replace plug-ins in Pipeline, or modify the function of the module.

Disclosure of Invention

The invention aims to overcome the problems in the prior art and provide a high-performance distributed combined multi-channel video real-time processing method, wherein video decoding, preprocessing, model reasoning and post-processing modules are all separated, the method is simple to use, and a specially designed data access mechanism between adjacent modules can obviously improve the multi-channel video processing efficiency.

In order to solve the above problems, the present invention provides a high-performance distributed multi-channel video real-time processing method, which comprises the following steps:

C. and constructing a shared queue between an upstream module and a downstream module, and setting a data access strategy.

Further, the starting the multiple processes/threads comprises: set the number of video source paths as(ii) a The decoding module has the process number ofOf 1 atDecodingThe number of threads opened by the process is(ii) a The number of the procedures of the pretreatment module isOf 1 atThe number of threads started in each pre-processing process is(ii) a The model reasoning module has the process number ofOf 1 atThe number of threads opened by the model reasoning process is(ii) a The number of processes of the post-processing module isOf 1 atThe number of threads opened by each post-processing process is。

Further, the step C specifically includes the following steps:

c1, video source data reading: only considering the situation that each video can be decoded by only one process/thread and one process/thread can decode multiple videos, then；

Order toIf it satisfiesWherein，Then will beDistribution of road video to the secondA first of decoding processesA thread;

c2, data access between upstream and downstream module processes:

c21, shared queue number setting:

memory moduleAnd moduleNumber of queues in betweenThen, it requiresThe method avoids the simultaneous operation of a plurality of processes on one queue, thereby generating system switching overhead;

c22, designing a data access strategy;

c23, model batch reasoning.

Further, C22 specifically includes the following steps: 1)the situation is as follows:

2）the situation is as follows:

3）the situation is as follows:

Further, C23 specifically includes the following operation steps:

1) establishing a temporary list, and setting batch inference data size batch _ size;

2) reading data from the input queue in a non-waiting mode, if the data is successful, saving the data to a temporary list, and performing the step 3), otherwise, jumping to the step 4);

3) judging whether the length of the list is equal to the batch _ size, if so, performing the step 4), and otherwise, repeating the step 2);

4) performing model batch reasoning on the temporary list data;

5) and (5) emptying the temporary list data and repeating the step 2).

Further, the tools used by the decoding module include FFmpeg, VideoProcessingFramework.

The invention has the beneficial effects that: 1) each processing module is decoupled, and the operation is convenient. The video decoding, preprocessing, model processing and post-processing modules involved in the method are mutually independent, so that the operations of adding, deleting and modifying the modules are easy to realize.

2) The treatment efficiency is high. Each processing module starts a multi-process mode, each process can start a plurality of threads, and a specially designed high-performance distributed combined data access mode between an upstream module and a downstream module maximizes the multi-channel video processing efficiency.

3) And the expandability is strong. The method is not limited to realizing the language, and a user can select the language according to the requirement, such as Python language with low threshold of starting, or C language with high running efficiency, or the combination of multiple languages.

Drawings

FIG. 1 is a block diagram of a high performance distributed combination multi-channel video real-time processing method according to the present invention;

FIG. 2 is a flow chart of the present invention for enabling multi-process/thread video decoding;

FIG. 3 is a flow chart of the data access in the multi-process pull mode of the present invention;

FIG. 4 is a flow chart of the multi-process push method for accessing data according to the present invention;

FIG. 5 is a flow chart of the data access by multiple processes according to the present invention;

FIG. 6 is a flow chart of model batch reasoning in the present invention.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.

As shown in fig. 1, the high-performance distributed multi-channel video real-time processing method of the present invention includes the following steps:

The video preprocessing module mainly performs some transformation operations such as image graying, pixel value standardization and image scaling on video frames according to business and model reasoning requirements. The embedded video model reasoning module is mainly a deep learning model, and analyzes video contents according to the requirements of service scenes, such as target detection, pedestrian tracking, attitude estimation and action recognition. The video post-processing module is mainly used for further analyzing and processing the model reasoning result, such as NMS operation and service logic judgment in target detection.

The invention can call the same basic module for a plurality of times according to the business requirement, for example, two model inference modules are called continuously, and the result of the former video model inference module is used as the input of the latter video model inference module.

B. Starting multiple processes/threads: each processing module starts a multi-process mode, and each process starts a plurality of threads; the number of processes/threads started by each module is determined jointly according to the requirements of service scenes, the performance of the model and the limitations of software and hardware resources;

the starting of the multiple processes/threads comprises: set the number of video source paths as(ii) a The decoding module has the process number ofOf 1 atThe number of threads opened by a decoding process is(ii) a The number of the procedures of the pretreatment module isOf 1 atThe number of threads started in each pre-processing process is(ii) a Model reasoning modelThe block process number isOf 1 atThe number of threads opened by the model reasoning process is(ii) a The number of processes of the post-processing module isOf 1 atThe number of threads opened by each post-processing process is。

C. Constructing a shared queue between an upstream module and a downstream module, and setting a data access strategy;

the step C specifically comprises the following steps:

c1, video source data reading: only considering the situation that each video can be decoded by only one process/thread and one process/thread can decode multiple videos, then；

Order toIf it satisfiesWherein，Then will beDistribution of road video to the secondA first of decoding processesA thread; for example, for an 8-channel video source, 2 decoding processes are started, and each process starts a video stream data reading mode of 2 threads as shown in fig. 2.

C2, data access between upstream and downstream module processes:

c21, shared queue number setting:

memory moduleAnd moduleNumber of queues in betweenThen, it requiresThe method avoids the simultaneous operation of a plurality of processes on one queue, thereby generating system switching overhead;

c22, designing a data access strategy; the data access adopts the rule of taking the remainder according to the module.

C22 specifically includes the following steps: 1)the situation is as follows:

if there are 4 upstream module processes, 2 downstream module processes, and 4 shared queues, the access mode is as shown in fig. 3;

adopt the non-waiting mode can further promote data reading efficiency, realize as:

while True：

try：

input_data = queue.get_nowait()

except：

pass。

2）the situation is as follows:

the upstream module processing process stores the processed result data to the slave shared queue in a push mode, so that one upstream module processing process corresponds to a plurality of shared queues; assuming that there are 2 upstream module processes, 4 downstream module processes, and 4 shared queues, the data access method is shown in fig. 4.

3）The situation is as follows:

the number of processes of the upstream module and the downstream module is smaller than the number of shared queues, and data access is carried out in a high-performance distributed combination mode at the moment; assuming that there are 2 upstream module processes, 2 downstream module processes, and 4 shared queues, the data access method is shown in fig. 5.

C23, model batch reasoning; and a batch reasoning mechanism is adopted to further accelerate the model reasoning speed and reduce the data backlog possibly generated in the input queue, as shown in figure 6.

C23 specifically includes the following operation steps:

1) establishing a temporary list, and setting batch inference data size batch _ size;

2) reading data from the input queue in a non-waiting mode, if the data is successful, saving the data to a temporary list, and performing the step 3), otherwise, jumping to the step 4);

3) judging whether the length of the list is equal to the batch _ size, if so, performing the step 4), and otherwise, repeating the step 2);

4) performing model batch reasoning on the temporary list data;

5) and (5) emptying the temporary list data and repeating the step 2).

According to the invention, each processing module is constructed in a multi-process mode, and a data access strategy between processes of adjacent modules is skillfully designed, so that the video processing speed is increased as much as possible while the video stream is ensured not to be out of order, and the real-time application requirement is met. In addition, the method is not limited to specific languages and has strong expansibility. The number of processes enabled by each module and the number of threads enabled by each process can be configured arbitrarily according to needs.

The processing method of the invention decouples modules of video decoding, pre-processing, model reasoning, post-processing and the like, and designs an access method of data among different modules at a certain point. And editing corresponding module functions, and performing video content analysis such as target detection and tracking, attitude estimation, motion recognition and the like.

In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

11页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：任务调度方法及相关装置

High-performance distributed combined multi-channel video real-time processing method

相关技术

网友询问留言