Intelligent chip architecture and method for efficiently processing data

文档序号:104898 发布日期:2021-10-15 浏览:25次 中文

阅读说明:本技术 一种智能芯片架构和高效处理数据的方法 (Intelligent chip architecture and method for efficiently processing data ) 是由 宋大为 于 2021-07-23 设计创作,主要内容包括:本发明公开了一种智能芯片架构,包括:模拟转数字单元、整型转浮点数单元、浮点运算单元、存储单元、算术逻辑单元、数据传输控制单元以及总线阵列控制单元;模拟转数字单元与整型转浮点数单元连接;整型转浮点数单元与总线阵列控制单元连接,并与数据传输控制单元连接;浮点运算单元与总线阵列控制单元连接;存储单元与总线阵列控制单元连接,并与数据传输控制单元连接;算术逻辑单元与数据传输控制单元连接;数据传输控制单元还与总线阵列控制单元连接。本发明减少总线的等待周期,降低总线拥塞,提高数据处理的吞吐量和数据处理效率;双数据总线和双地址总线的模式可以有效降低智能芯片架构的流水线长度,简化架构设计,且消耗的时钟周期变短。(The invention discloses an intelligent chip architecture, which comprises: the device comprises an analog-to-digital unit, an integer-to-floating point unit, a floating point arithmetic unit, a storage unit, an arithmetic logic unit, a data transmission control unit and a bus array control unit; the analog-to-digital conversion unit is connected with the integer-to-floating point number unit; the integer floating point number conversion unit is connected with the bus array control unit and is connected with the data transmission control unit; the floating point arithmetic unit is connected with the bus array control unit; the storage unit is connected with the bus array control unit and is connected with the data transmission control unit; the arithmetic logic unit is connected with the data transmission control unit; the data transmission control unit is also connected with the bus array control unit. The invention reduces the waiting period of the bus, reduces the congestion of the bus, and improves the throughput and the data processing efficiency of data processing; the double data bus and double address bus mode can effectively reduce the pipeline length of the intelligent chip architecture, simplify the architecture design, and shorten the consumed clock period.)

1. An intelligent chip architecture, comprising: the device comprises an analog-to-digital unit, an integer-to-floating point unit, a floating point arithmetic unit, a storage unit, an arithmetic logic unit, a data transmission control unit and a bus array control unit;

the analog-to-digital conversion unit is connected with the integer-to-floating point number unit;

the integer floating point number conversion unit is connected with the bus array control unit through a first double data bus and is connected with the data transmission control unit through a first signal bus;

the floating-point arithmetic unit is connected with the bus array control unit through a second double data bus;

the storage unit is connected with the bus array control unit through a third double data bus and is connected with the data transmission control unit through a first double address bus and a first control bus;

the arithmetic logic unit is connected with the data transmission control unit through a fourth double data bus and a second signal bus;

the data transmission control unit is also connected with the bus array control unit through a fifth double data bus, a second double address bus and a second control bus.

2. The smart chip architecture of claim 1 further comprising a plurality of peripheral units, each of the peripheral units being connected to the bus array control unit via a sixth dual data bus and a third dual address bus.

3. The smart chip architecture of claim 2 wherein each of the peripheral units includes a configuration register, the peripheral units selecting the corresponding configuration register via the third dual address bus and obtaining configuration parameters of the configuration register via the sixth dual data bus.

4. The smart chip architecture of claim 1 wherein the memory cells comprise random access memory sub-cells and non-volatile memory sub-cells.

5. The smart chip architecture of claim 1 wherein the bus array control unit comprises a transmission gate or a tri-state bus buffer.

6. The smart chip architecture of claim 1 wherein the data transmission control unit is configured to:

generating an operation time sequence for controlling the operation of the intelligent chip architecture;

transmitting a first control signal to the bus array control unit through the second control bus;

fetching instructions, decoding and transferring data to the arithmetic logic unit over the fourth data bus to perform data operations and writing data back to the memory unit over the first dual address bus;

and transmitting a second control signal to the memory cell through the first control bus to select a data transmission channel of the third dual data bus and an address transmission channel of the first dual address bus.

7. The smart chip architecture of claim 6 wherein the bus array control unit is to:

and receiving the first control signal, and switching among the data buses and/or among the address buses according to the first control signal.

8. The smart chip architecture of claim 1 wherein the integer to floating point number unit is to:

receiving a third control signal of the data transmission control unit through the first signal bus;

selecting input data of the first dual data bus according to the third control signal;

receiving integer data of the analog-to-digital conversion unit;

and performing data conversion on the received data, and transmitting the converted data to the corresponding unit through the bus array control unit by the first double data bus.

9. The smart chip architecture of claim 1 wherein the arithmetic logic unit is to:

and performing arithmetic operation on the data received through the fourth double data bus, and performing state marking through the second signal bus.

10. A method for efficiently processing data, the method using the micro-processing architecture of any of claims 1-9 to improve the efficiency of data processing.

Technical Field

The invention belongs to the technical field of computers, and particularly relates to an intelligent chip architecture and a method for efficiently processing data.

Background

MCU (micro controller Unit), also called as μ C for short, or Single Chip processor architecture (Single Chip Microcontroller), integrates ROM (Read-Only Memory), RAM (Random Access Memory), CPU (central processing Unit), and I/O (Input/Output interface) in the same Chip, and performs different combination control for different application occasions to form different intelligent Chip architectures. The existing intelligent chip architecture mainly has the following defects:

(1) in digital signal processing, floating point operation is often required in order to ensure the precision of calculation or the convergence of an algorithm; with the development of multimedia digital technology and the rise of artificial intelligence, the MCU has an increasing demand for floating point operations, such as digital filtering and image compression used in the fields of MP3 and unmanned aerial vehicle, and training and learning in the field of artificial intelligence, which require a large amount of floating point operations.

In the general MCU design, the conversion of integer data to floating point data needs to be realized by software, which may need dozens of instruction cycles, and for an application scenario with higher real-time requirement, signal processing jitter or delay may be caused, thereby causing output signal distortion, for example, real-time acquisition and digital filtering of multi-input audio in a microphone array, for realizing real-time property, the MCU is often required to have higher operating frequency, and the power consumption of the chip is proportional to the square of the frequency, so that the increase of the main frequency of the chip results in higher power consumption of the chip.

In addition, for floating-point multiplication, there are a fixed-point multiplication method and a method using a single-instruction hardware floating-point multiplier. The fixed-point multiplication mode has the defects that the representation range and the precision of data are influenced by different degrees and the universality is not strong; the single-instruction hardware floating-point multiplier mode has the defects that data needs to be sent into registers of a logic processing unit of a processor core, a multi-stage pipeline architecture and a plurality of data buses need to be added, and the universality is poor and the cost is high.

(2) Processor data operations may be divided into monocular operations, binocular operations, and trinocular operations. The existing MCU usually adopts a single data bus and an address bus design, only one-eye data can be carried between a memory and an arithmetic logic unit at one time, and a chip can sequentially carry binocular data to the arithmetic logic unit for operation through a pipeline architecture design mode, so that the binocular operation of a single instruction cycle is realized. However, when data needs to be written back to memory or memory data accessed, the pipeline needs to be interrupted or wait, which is essentially a single cycle instruction, or more efficient execution. Such architectures are not suitable for concurrent transmission and processing of large data volumes.

In addition, due to the design of the bus bandwidth, such as large data Access or DMA (Direct Memory Access) transmission from the peripheral to the Memory, the Memory to the peripheral, the bus president is often required to allocate, prevent the DMA from occupying too long time to cause the CPU to wait idle, or the CPU occupies the bus for a long time, so that the data of the peripheral cannot be transmitted to the memory in time to cause the data to be covered by new data, to solve this problem, the prior art has utilized increasing the length of the pipeline to increase the efficiency of instruction execution, the longer the pipeline, the more complex the chip architecture is designed, and the more prone it is to cause pipeline deadlock requiring forced waiting or pipeline clearing, and a certain clock cycle is consumed for reestablishing the pipeline, and if the pipeline is frequently cleared and established, the execution efficiency of the chip is reduced.

Disclosure of Invention

The invention aims to provide an intelligent chip architecture and a method for efficiently processing data, which are used for solving at least one problem in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the present invention provides an intelligent chip architecture, comprising: the device comprises an analog-to-digital unit, an integer-to-floating point unit, a floating point arithmetic unit, a storage unit, an arithmetic logic unit, a data transmission control unit and a bus array control unit;

the analog-to-digital conversion unit is connected with the integer-to-floating point number unit;

the integer floating point number conversion unit is connected with the bus array control unit through a first double data bus and is connected with the data transmission control unit through a first signal bus;

the floating-point arithmetic unit is connected with the bus array control unit through a second double data bus;

the storage unit is connected with the bus array control unit through a third double data bus and is connected with the data transmission control unit through a first double address bus and a first control bus;

the arithmetic logic unit is connected with the data transmission control unit through a fourth double data bus and a second signal bus;

the data transmission control unit is also connected with the bus array control unit through a fifth double data bus, a second double address bus and a second control bus.

In a possible design, the bus array control unit further comprises a plurality of peripheral units, and each peripheral unit is connected with the bus array control unit through a sixth double data bus and a third double address bus.

In a possible design, each of the peripheral units includes a configuration register, and the peripheral unit selects the corresponding configuration register through the third dual address bus and obtains configuration parameters of the configuration register through the sixth dual data bus.

In one possible design, the memory cells include random access memory sub-cells and non-volatile memory sub-cells.

In one possible design, the bus array control unit includes a transmission gate or a tri-state bus buffer.

In one possible design, the data transmission control unit is configured to:

generating an operation time sequence for controlling the operation of the intelligent chip architecture;

transmitting a first control signal to the bus array control unit through the second control bus;

fetching instructions, decoding, transferring data to the arithmetic logic unit over the fourth data bus to perform data operations, and writing data back to the memory unit over the first dual address bus;

and transmitting a second control signal to the memory cell through the first control bus to select a data transmission channel of the third dual data bus and an address transmission channel of the first dual address bus.

In one possible design, the bus array control unit is configured to:

and receiving the first control signal, and switching among the data buses and/or among the address buses according to the first control signal.

In one possible design, the integer to floating point number unit is to:

receiving a third control signal of the data transmission control unit through the first signal bus;

selecting input data of the first dual data bus according to the third control signal;

receiving integer data of the analog-to-digital conversion unit;

and performing data conversion on the received data, and transmitting the converted data to the corresponding unit through the bus array control unit by the first double data bus.

In one possible design, the arithmetic logic unit is to:

and performing arithmetic operation on the data received through the fourth double data bus, and performing state marking through the second signal bus.

In a second aspect, the present invention provides a method for efficiently processing data, the method using the micro-processing architecture as described in any one of the possible designs of the first aspect to improve the efficiency of data processing.

Has the advantages that:

1. the invention adopts the design of double data buses and double address buses, the data transmission control unit controls the bus array control unit to switch the working state of the data buses and/or the address buses through the second control bus according to the instruction of a computer system or the working state of the bus array control unit, thereby reducing the waiting period of the buses, reducing the congestion of the buses, improving the throughput of data processing and improving the data processing efficiency; in addition, the double data bus and double address bus mode can effectively reduce the pipeline length of the intelligent chip architecture, simplify the architecture design, shorten the consumed clock period and improve the execution efficiency.

2. The invention adopts the integer-to-floating point unit realized by a hardware structure, simultaneously adopts a double data bus to realize data transmission, and adopts a double address bus to realize address transmission, so that two input operands of the floating point arithmetic unit directly read and write with the random storage unit through the bus array control unit and the data transmission control unit, thereby greatly shortening the clock period used by the integer-to-floating point data and meeting the requirement of simultaneously converting a large number of analog signals into data.

3. According to the invention, through the data transmission control unit, data can be controlled to be directly transmitted and participate in calculation between the floating point arithmetic unit and the random storage unit through the bus array control unit, so that the system achieves the purpose of rapid operation, and an intelligent chip architecture core is not required to participate in data scheduling unlike the traditional intelligent chip architecture.

4. The analog-to-digital conversion unit is directly connected with the integer-to-floating point unit, so that the rapid conversion of data can be realized, and the operation rate is improved.

5. The invention has compact structure, and can greatly simplify the design of hardware structure in production, thereby leading the invention to achieve the calculation of large data volume which can be carried out only under the clock of a high-speed system by a traditional framework at lower main frequency, greatly improving the energy efficiency ratio and realizing the operation with low power consumption.

Drawings

FIG. 1 is a block diagram of an intelligent chip architecture in an embodiment of the invention;

fig. 2 is a schematic diagram of a data transmission control unit controlling a bus array control unit to perform data transmission according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments in the present description, belong to the protection scope of the present invention.

Examples

As shown in fig. 1-2, in a first aspect, the present invention provides an intelligent chip architecture, including: an analog-to-digital unit ADC, an integer-to-floating point unit UI2F, a floating point arithmetic unit FPU, a storage unit, an arithmetic logic unit ALU, a data transmission control unit DTC and a bus array control unit BCC;

the analog-to-digital unit ADC is connected with the integer-to-floating point unit UI 2F;

the analog-to-digital conversion unit ADC is used for converting analog quantity to digital quantity, so as to convert data into unsigned integer data which can be recognized by a processor; generally, the number of bits converted by the ADC is 8-24 bits, and the requirement for the number of bits is different according to different application fields, and the general MCU is configured to have a data width of 12 bits, and may be configured to have 24 bits in the audio field to achieve high fidelity.

It should be noted that, the analog-to-digital conversion unit ADC is connected to the integer-to-floating-point conversion unit UI2F by using a dedicated channel, for example, by using a parallel data line, which is not limited herein, and by directly connecting the analog-to-digital conversion unit ADC to the integer-to-floating-point conversion unit UI2F, fast conversion of data can be achieved, and the operation rate can be improved.

The integer-to-floating point unit UI2F is connected to the bus array control unit BCC via a first dual data bus and to the data transmission control unit DTC via a first signal bus;

it should be noted that the integer-to-floating point unit UI2F is a floating point unit that converts integer data into IEEE-754 standard format data implemented by a hardware structure, so that the instruction cycle required by the conventional software type conversion can be reduced.

It should be noted that the first dual data bus includes an output bus UFBB and an input bus UFBA, the first signal bus includes a signal bus DUC, and the data transfer control unit may control the input selection of the integer-to-floating-point number unit through the signal bus DUC.

The floating-point arithmetic unit FPU is connected with the bus array control unit BCC through a second double data bus;

wherein, the description is that the second dual data bus comprises an input dual bus FBA and FBB and an output single bus FOB.

The storage unit is connected with the bus array control unit BCC through a third double data bus and is connected with the data transmission control unit DTC through a first double address bus and a first control bus;

the memory unit includes a Random Access Memory (RAM) and a non-volatile memory (NVM); wherein the random access memory sub-unit RAM and the non-volatile memory sub-unit NVM are both connected to the bus array control unit BCC via the third dual data bus and to the data transmission control unit DTC via the first dual address bus and the first control bus; wherein the third dual data bus comprises a data bus DBA and a data bus DBB, the first dual address bus comprises an address bus ABA and an address bus ABB, the first control bus comprises data select control lines DSA and DSB, and address select control lines ASA and ASB.

The arithmetic logic unit ALU is connected with the data transmission control unit DTC through a fourth double data bus and a second signal bus;

it should be noted that the fourth dual data bus includes dual input buses CDA and CDB and an output bus COB, and the second signal bus includes a DACH signal bus.

The data transmission control unit is also connected with the bus array control unit through a fifth double data bus, a second double address bus and a second control bus.

It should be noted that the fifth dual data bus includes a data bus DBDA and a data bus DBDB, the second dual address bus includes an address bus ABDA and an address bus ABDB, and the second control bus includes a DBCH bridge control bus.

In an optional implementation manner, the system further comprises a plurality of peripheral units, and each peripheral unit is connected with the bus array control unit through a sixth double data bus and a third double address bus; each peripheral unit comprises a configuration register, the corresponding configuration register is selected by the peripheral unit through the third double-address bus, and the configuration parameters of the configuration register are obtained through the sixth double-data bus.

The peripheral unit includes a high-speed peripheral subunit and a low-speed peripheral subunit, where the high-speed peripheral subunit includes, but is not limited to, peripheral devices such as USB, ethernet, SDIO, and the low-speed peripheral subunit includes, but is not limited to, devices such as a serial port, SPI, a timer, an ADC, and a DAC; the sixth double data bus comprises a high-speed peripheral data bus HDB and a low-speed peripheral data bus SDB, and the third double address bus comprises a high-speed peripheral address bus HAB and a low-speed peripheral address bus SAB.

It should be noted that, in the above-mentioned numbers assigned to each of the dual data bus and the dual address bus, it is generally default that the bus ending with a is the main bus, and the bus ending with B is the sub-bus, but in practical applications, the two buses have no substantial primary and secondary, and the data transmission control unit will perform bus selection according to the operating states of the system chip and the instruction, so as to fully utilize the bandwidth of the dual bus for data transmission, but if both the bus a and the bus B are in the idle state, the main bus can be preferentially selected for data transmission, and this is not limited specifically here.

In an optional implementation manner, the data transmission control unit is configured to:

and generating an operation time sequence for controlling the operation of the intelligent chip architecture, wherein the operation time sequence comprises but is not limited to a starting time sequence when the chip of the micro-processing architecture is powered on and reset and a control time sequence for internal program operation.

Transmitting a first control signal to the bus array control unit BCC over the second control bus;

specifically, a first control signal is transmitted to the bus array control unit through the DBCH bridge control bus, so as to control a bus switch inside the BCC of the bus array control unit, and gate and disconnect between the data buses (HDB, SDB, UFBB, UFBA, DBA, DBB, FBA, FBB, FOB, DBDA, and DBDB) and between the address buses (HAB, SAB, ABDA, and ABDB) are realized, thereby realizing switching between buses and realizing multi-directional transmission of data.

Fetching, decoding and transferring data to the arithmetic logic unit, ALU, through the fourth data bus to perform data operations and writing data back to the memory unit through the first dual address bus;

specifically, the data transmission control unit DTC adopts a two-stage pipeline, that is, the fetch and the write-back are combined into a first stage, and the execution is a second stage. Thanks to the double data bus and the double address bus, the fetch and write-back operations can use one set of the data bus and the address bus respectively, so that the fetch and the write-back operations can be combined in the same stage pipeline; by adopting a double address bus and a double data bus, decoding can be simplified, so that decoding can be performed by using a delay of an RTL (Register-Transfer Level) after instruction fetching, and preparation is made for execution.

Data can be transmitted to the arithmetic logic unit through the double-input bus CDA or the input bus CDB to perform data operation, the operation result of the arithmetic logic unit ALU is received through the output bus COB, and the operation result data is written back to the random access memory unit RAM and/or the nonvolatile memory unit NVM through the address bus ABA or the address bus ABB; among other things, using a short pipeline means that the pipeline can be set up faster once a jump instruction is encountered, thereby reducing system clock execution jitter.

And transmitting a second control signal to the memory cell through the first control bus to select a data transmission channel of the third dual data bus and an address transmission channel of the first dual address bus.

Specifically, the data transmission control unit DTC controls the data transmission channel of the RAM and/or the NVM through the data selection control buses DSA and DSB according to a busy condition of a data bus of the BCC or a system command, and specifically, controls the high and low levels of the data selection control buses DSA and DSB to realize whether the data of the RAM and/or the NVM is transmitted from the data bus DBA or the data bus DBB.

Specifically, the data transmission control unit DTC controls the address transmission channel of the RAM and/or the NVM according to the busy condition of the address bus of the bus array control unit BCC or a system command and according to the address selection control buses ASA and ASB, and specifically, controls whether the address data is entered into the address unit inside the RAM and/or the NVM from the address bus ABA or from the address bus ABB by the high and low levels of the address selection control buses ASA and ASB.

It should be added that, inside the data transmission control unit DTC, the address bus ABDA is directly connected to the address bus ABA, and the address bus ABDB is directly connected to the address bus ABB.

As an alternative embodiment, the bus array control unit BCC is configured to:

and receiving the first control signal, and switching among the data buses and/or among the address buses according to the first control signal.

Specifically, the bus array control unit BCC receives a first control signal sent by the data transmission control unit DTC through the DBCH bridge control bus, and implements gating and disconnecting between the data buses (HDB, SDB, UFBB, UFBA, DBA, DBB, FBA, FBB, FOB, DBDA, and DBDB) and between the address buses (HAB, SAB, ABDA, and ABDB) according to the first control signal, thereby implementing switching between buses and further implementing multidirectional transmission of data.

As an alternative embodiment, the bus array control unit BCC comprises a transmission gate or a tri-state bus buffer; the data buses and the address buses can be switched through a transmission gate of the BCC or a tri-state bus buffer controlled by the bus array.

As an alternative embodiment, the integer to floating point unit UI2F is configured to:

receiving a third control signal of the data transmission control unit DTC through the first signal bus;

specifically, the integer-to-floating-point number unit UI2F receives a third control signal of the data transmission control unit DTC through the signal bus DUC, thereby selecting input data of the input bus UFBA.

Selecting input data of the first dual data bus according to the third control signal; wherein the input data includes, but is not limited to, data in the random access memory unit RAM, data in the non-volatile memory unit NVM and/or calculation result data of the arithmetic logic unit ALU.

Receiving integer data of the analog-to-digital conversion unit;

specifically, the integer to floating point unit UI2F receives the integer data of the ADC through a dedicated data channel.

And performing data conversion on the received data, and transmitting the converted data to the corresponding unit through the bus array control unit by the first double data bus.

Specifically, the received data in the random access memory unit RAM, the data in the non-volatile memory unit NVM, the calculation result data of the arithmetic logic unit ALU and the integer data of the analog-to-digital conversion unit ADC are converted into data, and the converted data is transmitted to a corresponding unit through the bus array control unit BCC via the output bus UFBB, where the unit includes, but is not limited to, the random access memory unit RAM, the non-volatile memory unit, the arithmetic logic unit ALU and the floating point unit FPU.

As an alternative implementation, the arithmetic logic unit ALU is configured to:

and performing arithmetic operation on the data received through the fourth double data bus, and performing state marking through the second signal bus.

Specifically, the arithmetic logic ALU pair receives data through the input bus CDA or the input bus CDB and performs arithmetic operations on the data, the arithmetic operations including but not limited to addition, subtraction, multiplication, division, and shift operations; then, data are output through an output bus COB, and the state of the operated ALU is marked through the DACH signal bus, wherein the DACH signal bus is also used for a control signal bus after the DTC decoding of the data transmission control unit.

It should be added that, in order to reduce the complexity of the overall architecture, after the instruction of the data transmission control unit is decoded, the source data of the shift operation may be transmitted and received through the input bus CDA, and the result data of the shift operation is placed on the input bus CDB for transmission; for the addition, subtraction, multiplication and division operations, the data added, subtracted, multiplied and divided is placed on the input bus CDA for transmission, and the result data added, subtracted, multiplied and divided is placed on the input bus CDB for transmission.

Further, when the data transmission control unit receives the result data of the shift operation and the result data of addition, subtraction, multiplication and division, it can select the free bus or the target bus to transmit data outwards according to the system instruction and/or the busy state of the data bus DBDA and the data bus DBDB.

In an optional implementation manner, the random access memory unit RAM specifically includes an SRAM memory and a DRAM memory, and a dual bus architecture design is adopted inside the random access memory unit RAM for implementing synchronous access of the SRAM memory and the DRAM memory, so as to accelerate data access;

it should be noted that the data transmission control unit DTC may generate two address parameters simultaneously and transmit the two address parameters to the address bus ABA and the address bus ABB, then control the dual-port bus selector to select an internal memory through the address selection control bus ASA and the address selection control bus ASB, and control the dual-port bus selector to output the selected memory to the data bus DBA and the data bus DBB through the data selection control bus DSA and the data selection control bus DSB.

In an optional implementation manner, the non-volatile memory unit NVM includes, but is not limited to, a read-write Flash memory, a ferroelectric memory, and a read-only ROM memory, and since the access speed of the non-volatile memory unit NVM is slower than that of the random access memory unit RAM, the non-volatile memory unit NVM is mainly used for storing code programs and data constants, and the access frequency is lower than that of the random access memory unit RAM, a single bus architecture may be adopted inside the non-volatile memory unit NVM, thereby reducing the manufacturing complexity and cost.

The codes stored in the non-volatile memory unit NVM can be transmitted to the data transmission control unit DTC via the dedicated data bus CBUS for decoding; the constants and initial memory values stored in the nonvolatile memory cell NVM can be outputted to the data bus DBA and the data bus DBB by controlling the dual port bus selector through the data selection control bus DSA and the data selection control bus DSB.

In an optional implementation manner, the floating point unit FPU may be designed as a data processing unit alone, so as to reduce the complexity of the whole architecture, and the chip architecture may increase or decrease the floating point unit FPU according to the requirement and the cost; the operation in the floating-point operation unit is a binocular operation, the operated number of the floating-point operation unit FPU can be transmitted through the input bus FBA, and the operation result number can be transmitted through the output bus FOB.

As a practical application of the present embodiment, the smart chip architecture of the present embodiment can be widely applied to digital signal processing and neural network learning, which is specifically described below:

as shown in fig. 2, since the floating point unit FPU has a multiply-add instruction function implemented by a single-cycle instruction, and there are a large number of multiply-add operations in the digital signal processing and neural network learning, the operation rate can be greatly increased by applying the floating point unit FPU to the digital signal processing and neural network learning.

Wherein the digital filter usually adopts the formulaThe data is operated, wherein y (n) is the filtering output, a (k) is the coefficient, x (n-k) is the historical input data, and the filtering output is realized through a series of addition and multiplication.

The neural network is divided into an input layer, a hidden layer and an output layer, wherein the hidden layer and the output layer are the product of weights of each neural node of the previous layer, and if the input value of the hidden layer isWherein x isijRepresenting an input value of a jth neuron of an ith layer in a neural network, n representing the number of neural units of a current hidden layer, o (i-1) h representing a weight value of an h neuron of an (i-1) th layer hidden layer, and b representing the bias of the neuron; therefore, the more the neural units are, the more the hidden layers are, the larger the operation quantity is, and the time complexity of single multiplication can reach o (n)2) In order, the time complexity is very high if the floating point unit FPU with a single instruction cycle is not used for operation.

Based on the analysis, the digital filtering and the neural network learning have high similarity and are mathematical operations with structural rules. When the data transfer control unit DTC is decoding, if the input bus FBA and the input bus FBB are idle, data can be directly transferred to the floating point arithmetic unit FPU. In practical application, the number of multiplication and addition stages is generally far greater than 16 or 32, and the number of registers in the conventional MCU architecture is small, so when the number of multiplication and addition stages is larger than the number of registers, part of variables and parameters are often required to be pushed and popped, which undoubtedly further consumes the system clock.

In this embodiment, the coefficients of the digital filter or the neuron weights of the neural learning network are stored in a continuous non-volatile memory unit or a random memory unit, such as x in fig. 21n...x1n-NThe historical value of the filter or the output of the neuron in the upper layer of the neural network is stored in another continuous non-volatile memory unit or random memory unit, such as x in FIG. 2kn...xkn-NThe data transmission control unit DTC controls the bus switch inside the bus array control unit BCC to switch the bus, and x is used for switching the bus1n...x1n-NAnd xkn...xkn-NThe coefficient of the digital filter, the weight of the neuron, the historical value of the filter and/or the output of the neuron in the upper layer are transmitted to the floating point unit FPU through the data bus FBA and the data bus FBB for operation, and the operation result of the floating point unit FPU can be output through the data bus FOB and stored into y in FIG. 20...y0-MIn (1).

It should be noted that, the intelligent chip architecture of this embodiment is adopted to implement operations in digital signal processing and neural network learning, and is different from a traditional vector processor, although a traditional vector processor can process multiply-add of multiple data by one instruction, the required bus width and the multiplier addition multiple relation to be processed simultaneously are large in design difficulty, and although the intelligent chip architecture of this embodiment cannot execute multiple operations by one instruction simultaneously, the structure is simple, the RTL layout circuit is also greatly reduced, which is beneficial to reducing the wafer area and the number of mask layers, thereby reducing the cost.

Based on the above disclosure, the present embodiment has the following beneficial effects:

1. in this embodiment, the design of a dual data bus and a dual address bus is adopted, and the data transmission control unit controls the bus array control unit to switch the working states of the data bus and/or the address bus through the second control bus according to the instruction of the computer system or the working state of the bus array control unit, so as to reduce the waiting period of the bus, reduce the bus congestion, and improve the throughput of data processing, thereby improving the data processing efficiency; in addition, the double data bus and double address bus mode can effectively reduce the pipeline length of the intelligent chip architecture, simplify the architecture design, shorten the consumed clock period and improve the execution efficiency.

2. In this embodiment, the integer-to-floating-point unit implemented by a hardware structure is adopted, the double data buses are adopted to implement data transmission, and the double address buses are adopted to implement address transmission, so that two input operands of the floating-point arithmetic unit are directly read and written with the random storage unit through the bus array control unit and the data transmission control unit, thereby greatly shortening the clock cycle used by the integer-to-floating-point data, and meeting the requirement of simultaneously performing data conversion on a large number of analog signals.

3. In the embodiment, the data can be controlled to be directly transmitted and participate in calculation between the floating point arithmetic unit and the random storage unit through the data transmission control unit through the bus array control unit, so that the system achieves the purpose of fast operation without needing an intelligent chip architecture core to participate in data scheduling like the traditional intelligent chip architecture.

5. The embodiment has a compact structure, and can greatly simplify the design of a hardware structure in production, so that the invention can achieve large-data-volume calculation which can be carried out only by a traditional framework under a high-speed system clock at a lower main frequency, the energy efficiency ratio is greatly improved, and low-power-consumption operation is realized.

In a second aspect, the present invention provides a method for efficiently processing data, the method using the micro-processing architecture as described in any one of the possible designs of the first aspect to improve the efficiency of data processing.

It should be noted that, the method applies the micro-processing architecture as described in any one of the possible designs of the first aspect, and can specifically implement efficient processing of data in digital signal processing and neural network learning, but it should be understood that the method is not limited to the application in the above two scenarios, and is also applicable to any other scenario in which data processing can be performed by using the smart chip architecture, and is not limited herein.

Finally, it should be noted that: the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种随机数产生器、生成伪随机数的方法和一种芯片

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类