Self-powered system-oriented storage and calculation integrated framework and software optimization method

文档序号：168654 发布日期：2021-10-29 浏览：30次中文

阅读说明：本技术 面向自供能系统的存算一体架构及软件优化方法 (Self-powered system-oriented storage and calculation integrated framework and software optimization method ) 是由邱柯妮周坤雨粟傈于 2021-05-18 设计创作，主要内容包括：本发明公开了一种面向自供能系统的存算一体架构及软件优化方法,包括：能量采集和管理模块、CPU模块和存算一体模块,所述能量采集和管理模块的输出端与CPU模块和存算一体模块的输入端电性连接,所述CPU模块与存算一体模块双向电性连接STT-MRAM阵列,其设置在所述存算一体模块的内部。其实现了让边缘设备依靠基于STT-MRAM阵列的加速器模块保证二值神经网络的高效运行,从而有效地避免了现有做法中边缘端设备将大量数据无线传输到性能较高的服务器进行处理而导致的数据传输开销大的问题。(The invention discloses a self-powered system-oriented storage and calculation integrated framework and a software optimization method, which comprise the following steps: the energy collection and management module, the CPU module and the storage and calculation integrated module are electrically connected, the output end of the energy collection and management module is electrically connected with the input end of the CPU module and the storage and calculation integrated module, and the CPU module and the storage and calculation integrated module are electrically connected with the STT-MRAM array in a bidirectional mode and are arranged inside the storage and calculation integrated module. The method and the device realize that the edge device can guarantee the efficient operation of the binary neural network by relying on the accelerator module based on the STT-MRAM array, thereby effectively avoiding the problem of high data transmission cost caused by the fact that the edge device wirelessly transmits a large amount of data to a server with higher performance for processing in the prior art.)

1. A self-powered system oriented storage and computation integrated processing architecture, comprising:

the energy collection and management module, the CPU module and the storage and calculation integrated module are electrically connected, wherein the output end of the energy collection and management module is electrically connected with the input ends of the CPU module and the storage and calculation integrated module, and the CPU module is electrically connected with the storage and calculation integrated module in a bidirectional mode;

an STT-MRAM array disposed inside the bank module.

2. The integrated storage and computation processing architecture according to claim 1, wherein an energy harvester and an energy management unit are disposed inside the energy harvesting and management module, the energy management unit includes an energy storage capacitor and a DC/DC converter, an output end of the energy storage capacitor is connected to an input end of the DC/DC converter, and an output end of the energy harvester is electrically connected to an input end of the energy management unit.

3. The computing-integrated processing architecture of claim 2, wherein the energy harvester comprises a photovoltaic solar panel, a wind power generation module, a wireless radio frequency charging module, a kinetic energy generation module and a thermal energy generation module, and the output ends of the photovoltaic solar panel, the wind power generation module, the wireless radio frequency charging module, the kinetic energy generation module and the thermal energy generation module are electrically connected with the input end of the energy management unit.

4. The banked processing architecture of claim 1, in which the STT-MRAM arrays are distributed inside a banked module, and a 1T1MTJ cell is disposed inside each of the STT-MRAM arrays.

5. A use of a computationally integrated processing architecture according to any of claims 1-5, comprising the steps of:

the method comprises the following steps: designing a reconfigurable in-memory processing architecture facing a self-powered scene based on an STT-MRAM array to enable the reconfigurable in-memory processing architecture to support efficient operation of a binary neural network;

step two: the binary convolution calculation can be realized by different logic combinations, and the binary neural network convolution calculation is mapped to a hardware platform;

step three: the method is optimized by adaptive software to adapt to fluctuations in energy to make the best possible use of energy.

6. The application of the processing architecture of claim 6, wherein the mapping of the binary neural network to the hardware platform in step two comprises the following steps:

the method comprises the following steps: according to the reconfigurability of a hardware architecture, the multiply-add operation of the binary neural network is completed by using XOR, XNOR and AND OR combinational logic;

step two: obtaining different mapping modes of the binary neural network calculation mapping to the hardware architecture;

step three: the method corresponds to different system power consumption according to the adopted mapping mode, and provides a basis for the offline modeling of the adaptive software optimization method.

7. A method for software optimization of a computationally integrated processing architecture according to any of claims 1-5, comprising the following two parts:

an off-line modeling part: the power consumption and the delay of binary convolution calculation are completed by analyzing various logic combinations in an off-line mode, a decision table is obtained according to the analysis result and the energy levels, and the off-line decision table obtained by the off-line modeling part comprises the optimal execution logic combination, the power consumption and the delay corresponding to each energy level so as to provide an execution decision for the fluctuation of the on-line simulation energy;

and an online simulation part: the method comprises the steps of obtaining an offline decision table and an energy trace table as input of online simulation, using the energy trace table as a simulation unstable self-powered scene, using the offline decision table as a basis for adapting to energy change, and simulating an execution process of a binary neural network in-memory processing architecture in a self-powered system.

8. A method for software optimization of a computational integrated processing architecture according to claim 8, wherein the offline modeling section wherein the determination of the offline decision table comprises the steps of:

the method comprises the following steps: acquiring an energy trace, dividing energy levels according to the characteristics of the energy trace, and determining energy level intervals and the number of energy levels;

step two: and obtaining logic combinations suitable for different energy levels according to the obtained power consumption, delay and divided energy levels, further obtaining an offline decision table, and determining the offline decision table to provide input for online simulation.

9. A method for software optimization of a computationally integrated processing architecture as claimed in claim 9, wherein in said on-line simulation portion, the method for determining logical combinations comprises the steps of:

the method comprises the following steps: firstly, traversing an energy trace table, judging the energy level of current energy, and selecting proper logic combination to execute binary neural network convolution calculation according to an offline decision table so as to adapt to the energy fluctuation problem of a self-powered scene;

step two: the energy level is low, and a scheme with low logic combination power consumption is selected for execution, so that energy is not wasted when the energy is low; the energy level is higher, and a scheme with lower logic combination delay can be selected for execution; when the energy is higher, parallel execution can be selected, and the energy can be utilized as much as possible;

step three: and (5) completing the traversal of the trace table, and calculating the energy efficiency and the throughput rate of the adopted architecture.

Technical Field

The invention relates to the technical field of computer system structures and storage, in particular to a self-powered system-oriented storage and calculation integrated framework and a software optimization method.

Background

Deep learning has shown excellent performance in various intelligent applications such as natural language processing, computer vision, and speech recognition. Deep learning is applied to edge-end embedded devices, so that the devices can be more intelligent and solve more extensive problems. The high memory capacity and computing power requirements of deep learning algorithms make complex neural network algorithms unsuitable for deployment to resource-limited devices. And the traditional von neumann architecture-based device has the further limitation of the possibility of applying deep learning to the edge-end device due to the storage wall problem caused by the great difference between the CPU processing speed and the memory access speed and the power consumption problem caused by the large amount of data migration and the storage medium leakage.

In addition, with the development of an energy collection technology, the equipment can be used by itself without a battery by collecting energy (such as solar energy, wind energy, radio frequency energy, heat energy of a human body, kinetic energy and the like) of the surrounding environment, and the self-powered system has the advantages of being green and economical, free of battery replacement/maintenance for charging and the like. But the collected environmental energy has unstable characteristics, and a great challenge is also presented in how to utilize the unstable energy.

When the neural network algorithm is deployed on the edge end device, a large amount of data is transmitted to a computer with higher performance for processing in the past, but the transmitted data needs more energy than storage or calculation, and a certain delay exists in the transmitted data, so that the method is not suitable for the device with limited power, energy and bandwidth. Local intelligent processing of edge-end devices in self-powered application scenarios faces significant challenges.

Disclosure of Invention

An object of the present invention is to solve at least the above problems and to provide at least the advantages described later.

The invention also aims to provide a storage and computation integrated processing architecture and a software optimization method, which ensure that the edge device can ensure the efficient operation of a binary neural network by relying on an accelerator module based on STT-MRAM (spin transfer torque-magnetic random access memory), thereby effectively avoiding the problem of high data transmission cost caused by the fact that the edge device wirelessly transmits a large amount of data to a server with higher performance for processing in the prior art.

In order to achieve the above objects and other objects, the present invention adopts the following technical solutions:

a computing integrated processing architecture, comprising:

An STT-MRAM disposed inside the integral storage module.

Preferably, an energy collector and an energy management unit are arranged inside the energy collection and management module, the energy management unit comprises an energy storage capacitor and a DC/DC converter, an output end of the energy storage capacitor is connected with an input end of the DC/DC converter, and an output end of the energy collector is electrically connected with an input end of the energy management unit.

Preferably, the energy collector comprises a photovoltaic solar panel, a wind power generation module, a wireless radio frequency charging module, a kinetic energy power generation module and a thermal energy power generation module, and the output ends of the photovoltaic solar panel, the wind power generation module, the wireless radio frequency charging module, the kinetic energy power generation module and the thermal energy power generation module are electrically connected with the input end of the energy management unit.

Preferably, the STT-MRAM arrays are distributed in the storage integral module in a crossed mode, and each STT-MRAM is internally provided with a 1T1MTJ unit.

Preferably, the application of the storage and computation integrated processing architecture comprises the following steps:

the method comprises the following steps: the method comprises the steps that a reconfigurable in-memory processing architecture oriented to a self-powered scene is designed based on STT-MRAM, so that the reconfigurable in-memory processing architecture supports efficient operation of a binary neural network;

step two: the binary convolution calculation can be realized by different logic combinations, and the binary neural network convolution calculation is mapped to a hardware platform;

step three: the method is optimized by adaptive software to adapt to fluctuations in energy to make the best possible use of energy.

Preferably, in the second step, the mapping method of the binary neural network and the hardware platform includes the following steps:

step two: obtaining different mapping modes of the binary neural network calculation mapping to the hardware architecture;

Preferably, the software optimization method of the storage and computation integrated processing architecture comprises the following two parts:

Preferably, in the offline modeling section, the determining of the offline decision table includes the steps of:

Preferably, in the online simulation part, the method for determining the logical combination includes the following steps:

step three: and (5) completing the traversal of the trace table, and calculating the energy efficiency and the throughput rate of the adopted architecture.

The invention at least comprises the following beneficial effects:

1. the invention arranges the energy collecting and managing module, the CPU module and the storing and calculating integrated module, STT-MRAM is distributed in the storing and calculating integrated module through array crossing, 1T1MTJ units are arranged in the STT-MRAM in the array, each 1T1MTJ unit supports AND, OR, NOT and XOR logic, different logics can be realized by using a plurality of 1T1MTJ units, the reconfigurability of the units provides hardware support for adapting to energy fluctuation, thereby assisting the CPU to process data, the functions can be directly connected with edge end equipment locally, the required energy consumption is low, the provided in-memory processing architecture and the self-adapting software optimization method can enable a binary neural network to run efficiently in the architecture, the non-volatile STT-MRAM is adopted to ensure that the equipment is powered off, the data is not lost, the problem of energy fluctuation can be adapted, the energy is fully utilized, and further the self-powered embedded equipment can locally and efficiently complete an intelligent inference program at the edge end, the method has the advantages that the method does not depend on a cloud, reduces the pressure of network transmission, and avoids the problems that the existing edge-end equipment mainly wirelessly transmits a large amount of data to a computer with higher performance for processing, but the transmitted data needs more energy than storage or calculation, and has certain time delay.

2. Through adopting energy collection and management module to give the integrative module of deposit and calculation and CPU energy supply, energy collection and management module include energy collector and energy management unit, wherein the collector includes photovoltaic solar panel, the wind power generation module, wireless radio frequency charging module, kinetic energy power generation module and heat energy power generation module, it can be with wind energy, solar energy, radio frequency energy, kinetic energy and heat energy etc. convert the electric energy into, the supply is used for the integrative module of deposit and calculation and CPU module, reach the effect from the energy supply, energy-concerving and environment-protective, and the loaded down with trivial details battery maintenance process has been avoided, the whole system is used extensively, can satisfy intelligent bracelet, the wearable device, wild animal detects, the user demand of multiple marginal end equipment such as exploration apparatus.

Drawings

FIG. 1 is a block diagram of a storage and computation integrated processing architecture according to the present invention;

FIG. 2 is a process diagram of a binary neural network convolution calculation provided by the present invention;

FIG. 3 is a memory schematic diagram of a 1T1MTJ cell in an STT-MRAM array provided by the present invention;

FIG. 4 is a schematic diagram of an in-memory computing array implementing logic operation of an accelerated binary neural network provided by the present invention;

FIG. 5 is a diagram of three binary convolution neural network calculation mapping modes provided by the present invention;

FIG. 6 is an exemplary diagram of an environmental energy sampling trace provided by the present invention;

FIG. 7 is a diagram of an offline decision table for a first layer convolution operation provided by the present invention.

Detailed Description

The present invention is described in detail below with reference to the attached drawings so that those skilled in the art can implement the invention by referring to the description.

1-7, a computing-integrated processing architecture, comprising: the energy collection and management system comprises an energy collection and management module 1, a CPU module 2 and a storage and calculation integrated module 3, wherein the output end of the energy collection and management module 1 is electrically connected with the input ends of the CPU module 2 and the storage and calculation integrated module 3, and the CPU module 2 is electrically connected with the storage and calculation integrated module 3 in a bidirectional mode; an STT-MRAM array 12 disposed inside the bank module 3.

In the scheme, the collectors in the energy collecting and managing module are a photovoltaic solar panel, a wind power generation module, a wireless radio frequency charging module, a kinetic energy generating module and a thermal energy generating module, when the energy collecting and managing module is applied to edge-end equipment, environmental energy can be converted into electric energy and stored in an energy storage capacitor of the energy managing unit, the electric energy is converted by a DC/DC converter and then supplied to the CPU module and the storage and calculation integrated module, so that the self-energy supply effect is achieved, the self-energy supply system is adopted for supplying power, and the energy collecting and managing module has the advantages of being green and economical, not needing to replace and maintain battery charging, in the set of structure, the CPU module is used as a main universal control module, in the aspect of storage, the storage and calculation integrated module is adopted, is a reconfigurable binary neural network accelerator module based on STT-MRAM and is in bidirectional electrical connection with the CPU module, and the STT-MRAM is in array cross distribution in the accelerator module, and each STT-MRAM is internally provided with a 1T1MTJ unit, the unit of each 1T1MTJ of the array supports AND, OR, NOT and XOR logic, different logics can be realized by utilizing a plurality of 1T1MTJ units, and the reconfigurability of the units provides hardware support for the applicable energy fluctuation, so that the function of replacing a computer to process data is achieved, the units can be directly connected with edge end equipment locally, the required energy consumption is low, and the delay condition of network transmission does not exist.

In a preferred scheme, an energy collector 4 and an energy management unit are arranged inside the energy collection and management module 1, the energy management unit includes an energy storage capacitor 10 and a DC/DC converter 11, an output end of the energy storage capacitor 10 is connected with an input end of the DC/DC converter 11, and an output end of the energy collector 4 is electrically connected with an input end of the energy management unit.

In the scheme, the energy collector can convey acquired electric energy to the energy management unit, the electric energy is stored in the energy storage unit, the electric energy is supplied to the CPU module and the power consumption requirement of the storage and calculation integrated module under the conversion effect of the DC/DC converter, a self-powered system is adopted, the energy-saving and environment-friendly effects are achieved, complicated battery maintenance procedures are avoided, the type of the voltage reduction-boosting type DC/DC converter is LTC3129, the set quantity of the voltage reduction-boosting type DC/DC converter depends on how much environmental energy needs to be controlled for conversion, the stability in power supply is ensured, the device has an accurate RUN pin threshold and a maximum power point control function, the former is used for providing a voltage stabilization communication effect, and the latter can ensure that the energy collector absorbs maximum power.

In a preferred scheme, the energy collector 4 comprises a photovoltaic solar panel 5, a wind power generation module 6, a wireless radio frequency charging module 7, a kinetic energy power generation module 8 and a heat energy power generation module 9, and the output ends of the photovoltaic solar panel 5, the wind power generation module 6, the wireless radio frequency charging module 7, the kinetic energy power generation module 8 and the heat energy power generation module 9 are all electrically connected with the input end of the energy management unit.

In the above scheme, the energy collection and management module is an environmental energy conversion device, which can convert wind energy, solar energy, radio frequency energy, kinetic energy, thermal energy and the like into electric energy to be supplied to the storage and calculation integrated module and the CPU module for use.

In a preferred scheme, the STT-MRAM arrays 12 are distributed inside the storage module 3, and each STT-MRAM array 12 is internally provided with a 1T1MTJ unit.

In the above scheme, the unit of each 1T1MTJ of the array supports and or, nor and xor logic, and different logics can be implemented by using a plurality of 1T1MTJ units, so that the reconfigurability thereof provides hardware support for applicable energy fluctuation, and data processing is implemented to replace direct connection of a computer and an edge end device.

In a preferred embodiment, the application of the storage-computing integrated processing architecture includes the following steps:

step two: the binary convolution calculation can be realized by different logic combinations, and the binary neural network convolution calculation is mapped to a hardware platform;

step three: the method is optimized by adaptive software to adapt to fluctuations in energy to make the best possible use of energy.

In the above solution, firstly, a memory computing architecture is adopted to implement efficient binary neural network computation, and for the accelerator module, an in-memory processing platform based on spin transfer torque-magnetic random access memory STT-MRAM is adopted, as shown in fig. 4, the platform implements the principle of and, or, not, and xor logic, each 1T1MTJ unit can execute a separate logic operation, including and, or, not, and xor logic, and by means of a control signal C, conversion between different logics can be implemented, the STT-MRAM array architecture of the hardware platform is composed of a plurality of 1T1MTJ units, the array has reconfigurability, and by configuring the array, a plurality of 1T1MTJ units can be combined to implement more complex logic operations.

In a preferred embodiment, in the second step, the mapping method between the binary neural network and the hardware platform includes the following steps:

step two: obtaining different mapping modes of the binary neural network calculation mapping to the hardware architecture;

In the above scheme, since the convolution calculation of the binary neural network can be realized by an exclusive or, and each 1T1MTJ unit of the adopted accelerator module supports an exclusive or, an and, or and a non-logic, there are various mapping manners for the convolution calculation of the binary neural network, as shown in fig. 5, three mapping manners are realized, the first directly maps the calculation to the first column, and each unit of the column supports an exclusive or logic; the second kind of XOR is implemented by AND or, and the 2 nd to 6 th columns are combinational logic for implementing XOR logic; the third is to implement exclusive OR and NOR, and the N-7 th column to the Nth column are combinational logic for implementing exclusive OR logic.

In a preferred embodiment, the software optimization method of the storage and computation integrated processing architecture includes the following two parts:

In the scheme, the off-line modeling part firstly needs to acquire the power required by the 1T1MTJ unit in the STT-MRAM array to execute various logic operations and the delay for completing the logic operations, and the power required by the 1T1MTJ unit in the STT-MRAM array to execute exclusive-OR logic, AND logic, OR logic and non-logic is respectively P_xor、P_and、P_orAnd P_not(ii) a The delays for performing XOR, AND, OR, and NOT are T_xor、T_and、T_orAnd T_notSecondly, selecting the adopted environmental energy and dividing energy levels, wherein the adopted environmental energy is a household WiFi signal as shown in FIG. 6, and analyzing the power and delay required by various exclusive-or combinational logics in an off-line manner according to the power required by executing the logic operation, the delay of the logic and the energy level division of the adopted energy, so as to design an off-line decision table; the online simulation part uses a decision table generated by offline modeling, as shown in fig. 7, to describe an online simulation process by using four energy sampling periods, where the four energy sampling powers are: 50 μ W, 820 μ W, 360 μ W and 550 μ W.

In a preferred embodiment, in the offline modeling section, the determining of the offline decision table includes the following steps:

In the above scheme, the neural network used for the offline modeling is a two-layer convolutional neural network: LeNet network, which is a two-layer networkThe first layer of convolution kernel is 6x5x5x1, the second layer of convolution kernel is 16x5x5x6, after the LeNet network is binarized, 150 times of exclusive-or operation need to be executed in the first layer of calculation for convolution calculation, 2400 times of exclusive-or operation need to be executed in the second layer of calculation for convolution calculation, according to the sampling trace graph of the household WiFi signal environment energy, the acquired power range is 0-1000 muW, the energy is divided into 4 energy levels, the energy level 1 is 0-200 muW, the energy level 2 is 200-400 muW, the energy level 3 is 400-600 muW, the energy level 4 is more than 600 muW, and according to three logic mapping modes of the binary neural network and a hardware platform, the power needed for executing the first layer of convolution by adopting the first logic is 150P_xorThe delay of completion calculation is T_xorThe power required by the second layer convolution calculation is 2400P_xorThe delay of completion calculation is T_xor(ii) a The maximum power required to perform the first layer of convolution using the second logic is 150P_andThe delay of completion calculation is T_andThe power required by the second layer convolution calculation is 2400P_andThe delay of completion calculation is T_and(ii) a The maximum power required to perform the first layer of convolution using the third logic is 150P_orThe delay of completion calculation is T_orThe power required by the second layer convolution calculation is 2400P_orThe delay of completion calculation is T_or。

In a preferred embodiment, in the online simulation part, the method for determining the logical combination includes the following steps:

step three: and (5) completing the traversal of the trace table, and calculating the energy efficiency and the throughput rate of the adopted architecture.

In the scheme, when a first energy sampling period is entered, firstly, the last convolution operation of which layer is completed is obtained, then, the sampling power is obtained to be 50 muW, the energy level is judged to belong to the energy level 1, and then, according to a decision table, the energy level cannot be continuously executed, and data backup is carried out; when entering a second energy sampling period, firstly acquiring which layer of convolution operation is finished last time, judging that the first layer of convolution operation should be executed, then acquiring that the sampling power is 820 μ W, judging that the energy level belongs to an energy level 4, then according to a decision table of the first layer of convolution operation, continuously executing the energy level, selecting corresponding logic to execute the first layer of convolution operation of the LeNet network, when the time after the first layer of convolution operation is executed does not exceed the sampling period, executing a second layer of convolution operation according to the decision table of the second layer of convolution operation and the combinational logic corresponding to the energy level, and repeating the process until entering the next sampling period; when entering a third energy sampling period, firstly acquiring which layer of convolution operation is finished last time, judging that the second layer of convolution operation should be executed, then acquiring that the sampling power is 360 muW, judging that the energy level belongs to an energy level 2, then according to a decision table of the second layer of convolution operation, continuously executing the energy level, selecting corresponding logic to execute the second layer of convolution operation of the LeNet network, when the time after the second layer of convolution operation is executed does not exceed the sampling period, executing the first layer of convolution operation according to the decision table of the first layer of convolution operation and the combinational logic corresponding to the energy level, and repeating the process until entering the next sampling period; when entering a fourth energy sampling period, firstly acquiring which layer of convolution operation is finished last time, judging that the first layer of convolution operation should be executed, then acquiring that the sampling power is 550 μ W, judging that the energy level belongs to the energy level 3, then according to a decision table of the first layer of convolution operation, continuously executing the energy level, selecting corresponding logic to execute the first layer of convolution operation of the LeNet network, when the time after the first layer of convolution operation is executed does not exceed the sampling period, executing the second layer of convolution operation according to the decision table of the second layer of convolution operation and the combinational logic corresponding to the energy level, and repeating the process until entering the next sampling period.

While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.

14页详细技术资料下载

Self-powered system-oriented storage and calculation integrated framework and software optimization method

相关技术

网友询问留言