Performance optimization method of domestic embedded DSP operating system

文档序号:1952543 发布日期:2021-12-10 浏览:12次 中文

阅读说明:本技术 国产嵌入式dsp操作系统的性能优化方法 (Performance optimization method of domestic embedded DSP operating system ) 是由 赵俊才 何玲玲 王永兵 李富坤 于 2021-08-17 设计创作,主要内容包括:本发明提供了一种国产嵌入式DSP操作系统的性能优化方法,所述方法包括如下步骤:步骤S1:DSP芯片的多核使用同一个代码段空间;代码段空间是连续的;步骤S2:离散RAM空间到连续空间的映射;步骤S3:各核运行单独的操作系统实例。通过本发明,每个DSP内核在数据处理过程中用到的指令和数据的存放位置都使用了片内的RAM资源,可显著提高运算性能,特别是在雷达信号处理、移动通信、电子对抗等即要求功能丰富、又对计算性能要求苛刻的领域具有重要的作用;在本发明中使用共享的代码段,意味着全部内核可以使用同一个工程完成应用设计,可以提高开发效率,降低开发成本。(The invention provides a performance optimization method of a domestic embedded DSP operating system, which comprises the following steps: step S1: multiple cores of the DSP chip use the same code segment space; the code segment space is contiguous; step S2: mapping of discrete RAM space to continuous space; step S3: each core runs a separate operating system instance. According to the invention, the storage positions of the instructions and data used by each DSP kernel in the data processing process use the RAM resources in the chip, so that the operation performance can be obviously improved, and the method has an important function in the fields of radar signal processing, mobile communication, electronic countermeasure and the like which have rich requirements and strict requirements on the operation performance; the shared code segment is used in the invention, which means that all the kernels can use the same project to complete the application design, thereby improving the development efficiency and reducing the development cost.)

1. A performance optimization method for a domestic embedded DSP operating system is characterized by comprising the following steps:

step S1: multiple cores of the DSP chip use the same code segment space; the code segment space is contiguous;

step S2: mapping of discrete RAM space to continuous space;

step S3: each core runs a separate operating system instance.

2. The method as claimed in claim 1, wherein all cores use a shared code segment space, and the code segment is read-only during program execution.

3. The method for optimizing the performance of a domestic embedded DSP operating system according to claim 1, wherein all the kernel code segments are the same, the same part comprises the operating system's own code segment, the driver code segment, and the common algorithm library, and the different part is a running program private to each kernel.

4. The method for optimizing the performance of a domestic embedded DSP operating system according to claim 1, wherein in said step S2, the code segments before mapping and after mapping are separated; the mapping process is executed by each core, and the code sections executed before mapping and the code sections executed after mapping are separately arranged.

5. The method for optimizing the performance of a domestic embedded DSP operating system according to claim 4, wherein initialization of registers and chips is performed before mapping, and code segments are assigned on shared memory space inside the chip or on DDR outside the chip.

6. The method for optimizing the performance of a domestic embedded DSP operating system according to claim 1, wherein in step S2, a small portion of RAM space of each core is extracted and mapped into a large continuous space for storing code segments by using a space mapping method.

7. The method for optimizing the performance of the domestic embedded DSP operating system according to claim 6, wherein the mapping step is as follows:

step 1: setting a physical address of RAM in each core, a logic address shared by all cores and the total length of code segments;

step 2: the access right after mapping is set to comprise a read right, a write right and an execution right, and the code only needs the read right and the execution right.

8. The method for optimizing the performance of a domestic embedded DSP operating system according to claim 1, wherein in the initialization process of step S3, a global variable and a stack space of the embedded DSP operating system are set, and each core in the embedded DSP operating system of AMP architecture has an independent space, and the independent space uses the remaining on-chip RAM resources.

9. The method for optimizing the performance of a domestic embedded DSP operating system according to claim 8, wherein the code segments, data segments and resources required during the operation of the stack space operating system of the application program are loaded to the RAM of each kernel by the spatial layout.

10. The method as claimed in claim 9, wherein after the initialization of the os is completed, the core number of the current core is obtained by reading the register in the core, and the task that the current core should execute is executed according to the core number, and the spatial distribution is the same for all cores regardless of whether the code segments shared by the os and the driver or the code segments private to each core are uniformly located.

Technical Field

The invention relates to the technical field of embedded DSP operating systems, in particular to a performance optimization method of a domestic embedded DSP operating system.

Background

Compared with an embedded CPU, the operating system of the embedded multi-core DSP chip is developed later, the traditional DSP development mode is bare computer development, the functions completed by application programs are single and uncomplicated, and a user can directly access hardware resources and drivers of the DSP chip without using the operating system. For such scenes, a user can directly put the code segments and the data segments of the application program into a high-speed RAM in a DSP chip core for running so as to achieve the purpose of quickly processing data.

With the increase of the integration level and the application complexity of the DSP chip, the traditional bare computer development mode has been unable to meet the requirements, and because the operating system-free mode cannot use multitask, network services, file systems, etc., it has become a great trend to use the operating system on the DSP chip. However, the use of the operating system also brings another problem, and compared with the development of a bare computer, the required running space is multiplied, and the limited RAM space of the DSP chip kernel cannot meet the requirement.

In patent document CN112035346A, an automated testing method, system and medium based on embedded DSP operating system are disclosed, which includes: step 1: starting test software of the target machine and the upper computer; step 2: acquiring a test case list; and step 3: and testing according to the test case list and feeding back a test result.

At present, especially in the application scenario in the field of military equipment, the DSP chip is required to use more abundant functions, and at the same time, the fast data processing capability cannot be sacrificed, so that a new performance optimization method is researched, that is, abundant operating system resources can be used, and meanwhile, the running efficiency of the application can be ensured, which is a problem to be solved urgently in the research and development process of a domestic DSP embedded system.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a performance optimization method of a domestic embedded DSP operating system.

According to the performance optimization method of the domestic embedded DSP operating system provided by the invention, the method comprises the following steps:

step S1: multiple cores of the DSP chip use the same code segment space; the code segment space is contiguous;

step S2: mapping of discrete RAM space to continuous space;

step S3: each core runs a separate operating system instance.

Preferably, all cores use a shared code section space, and the code section is read-only during program execution.

Preferably, all of the cores are the same, the same parts include the operating system's own code segments, driver code segments, common algorithm libraries, and the different parts are each core's private running program.

Preferably, the code segments before mapping and after mapping in the step S2 are separated; the mapping process is executed by each core, and the code sections executed before mapping and the code sections executed after mapping are separately arranged.

Preferably, the initialization work of the registers and chips is performed before mapping, and the code segments are specified on shared memory space on chip or on DDR off chip.

Preferably, in step S2, a small part of RAM space of each core is extracted and mapped into a large continuous space for storing code segments by using a space mapping method.

Preferably, the mapping step is as follows:

step 1: setting a physical address of RAM in each core, a logic address shared by all cores and the total length of code segments;

step 2: the access right after mapping is set to comprise a read right, a write right and an execution right, and the code only needs the read right and the execution right.

Preferably, in the initialization process of step S3, in the embedded DSP operating system with AMP architecture, the global variables and the stack space of the embedded DSP operating system are set, each core has an independent space, and the independent space uses the remaining on-chip RAM resources.

Preferably, through the space layout, code segments, data segments and resources required by the stack space operating system during running of the application program are placed on the RAM of each kernel.

Preferably, after the initialization of the operating system is completed, the core number of the current core is obtained by reading the in-core register, and the task which the current core should execute is executed according to the core number, and whether the code segment shared by the operating system and the driver or the code segment private to each core is uniformly placed, the spatial distribution is the same for all cores.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the invention, the storage positions of the instructions and data used by each DSP kernel in the data processing process use the RAM resources in the chip, so that the operation performance can be obviously improved, and the method has an important function in the fields of radar signal processing, mobile communication, electronic countermeasure and the like which have rich requirements and strict requirements on the operation performance;

2. the shared code segment is used in the invention, which means that all the kernels can use the same project to complete the application design, thereby improving the development efficiency and reducing the development cost;

3. the invention can greatly improve the operation performance of the domestic embedded DSP operation system based on the AMP architecture; the method realizes the use of a continuous code segment space required by a uniform embedded operating system for each DSP core under the AMP architecture; each DSP core is provided with an independent global data segment and an independent running space; the code segment, the data segment and the running space of the operating system can be ensured to completely use the high-speed RAM resource in the DSP chip, and the purpose of quickly processing data is achieved.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a diagram of a memory space structure of a multi-core DSP chip (6678) according to the present invention;

FIG. 2 is a diagram illustrating a comparison of the space occupied by the independent code sections and the shared code sections;

FIG. 3 is a mapping of physical space to logical space in accordance with the present invention;

FIG. 4 is a diagram illustrating the utilization of the in-core RAM resources according to the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

The invention relates to a performance optimization method of a domestic embedded DSP operating system, which comprises the following steps:

step S1: at present, a mainstream DSP chip is basically in an AMP framework, namely, one chip consists of a plurality of cores, the cores are equal, each core has own high-speed RAM resource, and meanwhile, a shared storage space exists in the chip. Due to cost constraints, the capacity of either the high-speed RAM in the core or the memory space in the chip is very limited. Taking the most common 6678 model DSP chip as an example, each chip includes 8 cores, each core has a RAM of 512K, and a shared 4M memory space is also provided in the chip, and the structure is shown in fig. 1.

From the viewpoint of processing performance, the operation performance when using the in-core RAM is much higher than that of the on-chip shared memory space and the off-chip DDR, which is determined by the physical characteristics of the memory itself. After the operating system is used, the RAM space in the core cannot meet the requirement for running the operating system alone due to the multiplied increase of the code segment space. Therefore, all the cores use the shared code segment space, the code segments are read only in the program running process, and the situation of data inconsistency can not occur in the running process after multi-core sharing.

In fact, most of the code segments of all cores are the same, these same parts include the code segments of the operating system itself, the driver code segments, the common algorithm library, etc., these are all independent of the cores, and the different parts are the running programs private to each core, and occupy very few parts in the code segments, so that the use of the same code segment by multiple cores saves more space resources than the use of independent code segments by each core, as shown in fig. 2; in fig. 2, the left side is the space resource occupied by each core using an independent code segment, and the right side is the space resource occupied by the same code segment.

Step S2: the mapping of the discrete RAM space to the continuous space uses the same code segment space in step S1, which includes the operating system, driver code segments and common algorithm library shared by all cores, and also includes the code segments of all cores' private programs, and the space thereof is necessarily larger than the code segment space of a single core, and the code segment is difficult to be placed on the RAM of the core, because the operating system and the application program are running, and besides the code segment, there are private spaces of each core such as data segment and stack space, and if the operating system and the application program are running, the performance will be significantly reduced by placing the DDR on-chip shared memory space or off-chip.

For embedded DSP operating systems, the code segment space is typically contiguous. In the present invention, a space mapping method is used, a small portion of RAM space of each core is extracted and mapped into a large continuous space for storing code segments, as shown in fig. 3.

Taking the mainstream DSP chip of 8-core 66 series as an example, each core takes out the RAM space of 1/4(128K) or 1/8(64K), and can be mapped into a continuous 1M or 512K space, which is more than enough to store the shared code segments.

The mapping steps are as follows: setting a RAM physical address in each core, a logic address shared by all cores and the total length of available code segments; and setting the mapped access authority comprising read authority, write authority, execution authority and the like, wherein for the code segment, only the read authority and the execution authority are needed.

Step S3: the pre-mapping and post-mapping code segments are separated, and the mapping process is performed by each core in step S2, so that there is a problem in that the code segments performed before mapping are placed. Since the logical addresses of the code sections are not available before mapping, the approach taken in the present invention is to place the pre-mapped executed code sections and the post-mapped executed code sections separately.

The most basic initialization work of registers and chips is executed before mapping, and the operations are executed only once at the initial stage of the starting of an operating system, so that the part of the code segment can be specially appointed on an on-chip shared memory space or an off-chip DDR, because the operation is executed only once, the part of the code segment is not used in the following specific data processing process, the subsequent processing performance cannot be influenced no matter the part of the code segment is placed, and the part of the code segment is small and can be flexibly configured at any position without additional mapping.

Step S4: each core runs a separate operating system instance, and this step is linked to step S2, and when the operating system is started and the mapping of step 2 is executed, corresponding initialization work is performed. In the initialization process, the global variables, the stack space, and the like of the core are set, each core in the embedded DSP operating system of the AMP architecture has an independent space, and in order to improve the processing performance, the spaces may use the remaining on-chip RAM resources, and the space allocation of the RAM resources on each core is as shown in fig. 4.

Through the space layout, resources required in the running process of an operating system such as code segments, data segments, stack spaces and the like of an application program can be placed on the RAM of each kernel, and in the data processing process, storage resources used by instructions and data are RAM in a chip, so that the storage performance is greatly improved compared with a shared memory on the chip and DDR outside the chip.

After the initialization of the operating system is completed, the core number of the current core can be obtained by reading the register in the core, and then the task which should be executed by the current core is executed according to the core number, whether the code segments shared by the operating system and the driver or the code segments private to each core are uniformly placed, and the spatial distribution is the same for all the cores.

According to the invention, the storage positions of the instructions and data used by each DSP kernel in the data processing process use the RAM resources in the chip, so that the operation performance can be obviously improved, and the method has an important function in the fields of radar signal processing, mobile communication, electronic countermeasure and the like which have rich requirements and strict requirements on the operation performance; the shared code segment is used in the invention, which means that all the kernels can use the same project to complete the application design, thereby improving the development efficiency and reducing the development cost.

The invention can greatly improve the operation performance of the domestic embedded DSP operation system based on the AMP architecture; the method realizes the use of a continuous code segment space required by a uniform embedded operating system for each DSP core under the AMP architecture; each DSP core is provided with an independent global data segment and an independent running space; the code segment, the data segment and the running space of the operating system can be ensured to completely use the high-speed RAM resource in the DSP chip, and the purpose of quickly processing data is achieved.

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于执行片上网络(NoC)中的事务聚合的系统和方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!