The circuit and method of data processing in nerve network system

文档序号:1772648 发布日期:2019-12-03 浏览:30次 中文

阅读说明:本技术 神经网络系统中数据处理的电路和方法 (The circuit and method of data processing in nerve network system ) 是由 费旭东 周红 袁宏辉 于 2018-05-24 设计创作,主要内容包括:本申请提供一种神经网络系统中数据处理的电路和方法,能够在神经网络的硬件架构中兼容多种精度的数据格式。该电路包括串行加法电路、第一非线性映射电路、累加电路和第二非线性映射电路,其中,该串行加法电路,串行获取至少一个输入数据中的每个输入数据和该每个输入数据对应的权重参数,对每个输入数据和每个输入数据对应的权重参数进行串行加法运算得到至少一个第一数据;该第一非线性映射电路,对至少一个第一数据中的每个第一数据进行第一非线性映射,获得至少一个第二数据;该累加电路,对至少一个第二数据进行累加;该第二非线性映射电路,对累加电路输出的累加结果进行第二非线性映射,获得输出数据。(The application provides the circuit and method of data processing in a kind of nerve network system, and the data format of a variety of precision can be compatible in the hardware structure of neural network.The circuit includes serial addition circuit, the first Nonlinear Mapping circuit, summation circuit and the second Nonlinear Mapping circuit, wherein, the serial addition circuit, each input data at least one input data weight parameter corresponding with each input data is serially obtained, serial addition operation is carried out to each input data and the corresponding weight parameter of each input data and obtains at least one first data;The first Nonlinear Mapping circuit carries out the first Nonlinear Mapping to the first data of each of at least one first data, obtains at least one second data;The summation circuit adds up at least one second data;The second Nonlinear Mapping circuit carries out the second Nonlinear Mapping to the accumulation result of summation circuit output, obtains output data.)

1. the circuit of data processing in a kind of nerve network system, which is characterized in that including serial addition circuit, first non-linear Mapping circuit, summation circuit and the second Nonlinear Mapping circuit, wherein

The serial addition circuit serially obtains each input data at least one input data and each input number According to corresponding weight parameter, each input data and the corresponding weight parameter of each input data serially add Method operation obtains at least one first data;

It is non-thread to carry out first to the first data of each of at least one first data for the first Nonlinear Mapping circuit Property mapping, obtain at least one second data, wherein first Nonlinear Mapping be with the exponential transform at 2 bottoms;

The summation circuit adds up at least one described second data;

The second Nonlinear Mapping circuit carries out the second Nonlinear Mapping to the accumulation result of summation circuit output, obtains Output data, wherein second Nonlinear Mapping is according to the Nonlinear Mapping of the neural network and described first non- What Linear Mapping determined.

2. circuit according to claim 1, which is characterized in that further include:

Serial-parallel conversion circuit, serially obtains at least partly data of each first data, and by each first data At least partly data parallel export to the first Nonlinear Mapping circuit.

3. circuit according to claim 2, which is characterized in that the first Nonlinear Mapping circuit includes Nonlinear Mapping Unit, shift control circuit and shift register circuit, wherein

The Nonlinear Mapping unit obtains the fractional part of each first data parallel, to each first data Fractional part carry out first Nonlinear Mapping, obtain third data, and the third data are exported to the displacement Register circuit;

The shift control circuit obtains the integer part of each first data, and according to each first data Integer part, Xiang Suoshu shift register circuit output control signal;

The shift register circuit shifts the third data according to the control signal, obtain it is described at least one Second data.

4. circuit according to claim 3, which is characterized in that the control signal is k clock signal, and the displacement is posted It deposits circuit serial and exports the third data, and is serially defeated according to the k clock signal after exporting the third data K 0 out, obtain second data, wherein k is the value of the integer part of first data, and k is positive integer.

5. circuit according to claim 3, which is characterized in that the shift register circuit is according to the control signal to institute It states third data and moves to left k, obtain second data, wherein k is the value of the integer part of first data, and k is positive Integer.

6. circuit according to claim 5, which is characterized in that the control signal is k clock signal.

7. circuit according to claim 5, which is characterized in that the control signal is j clock signal, wherein the j I-th of clock signal in a clock signal is corresponding by the i-th bit of low to high with the binary number of the integer part, J is the digit of the binary number of the integer part, and i, j are positive integer, 1≤i≤j;

Wherein, when the i-th bit of the binary number is non-zero, the shift register circuit according to i-th of clock signal, 2 are moved to left to the third datai-1Position;

When the i-th bit of the binary number is 0, the shift register circuit is not shifted the third data.

8. a kind of method of data processing in nerve network system, which is characterized in that the method is executed by data processing circuit, The data processing circuit includes serial addition circuit, the first Nonlinear Mapping circuit, summation circuit and the second Nonlinear Mapping Circuit, which comprises

Each input data at least one input data and described each defeated is obtained by the serial addition circuit serial Enter the corresponding weight parameter of data, each input data and the corresponding weight parameter of each input data are gone here and there Row add operation obtains at least one first data;

First is carried out to the first data of each of at least one first data by the first Nonlinear Mapping circuit Nonlinear Mapping obtains at least one second data, wherein first Nonlinear Mapping is with the exponential transform at 2 bottoms;

It is added up by the summation circuit at least one described second data;

The second Nonlinear Mapping is carried out to the accumulation result that the summation circuit exports by the second Nonlinear Mapping circuit, Obtain output data, wherein second Nonlinear Mapping is the Nonlinear Mapping and described first according to the neural network What Nonlinear Mapping determined.

9. according to the method described in claim 8, it is characterized in that, the data processing circuit further includes serial-parallel conversion circuit, The method also includes:

By at least partly data for serially obtaining each first data, and at least by each first data Partial data parallel output is to the first Nonlinear Mapping circuit.

10. according to the method described in claim 9, it is characterized in that, the first Nonlinear Mapping circuit includes non-linear reflects Penetrate unit, shift control circuit and shift register circuit, wherein it is described by the first Nonlinear Mapping circuit to it is described extremely The first data of each of few first data carry out the first Nonlinear Mapping, obtain at least one second data, comprising:

The fractional part for obtaining each first data parallel by the Nonlinear Mapping unit, to each first number According to fractional part carry out first Nonlinear Mapping, obtain third data, and the third data are exported to the shifting Position register circuit;

The integer part of each first data is obtained by the shift control circuit, and according to each first data Integer part, Xiang Suoshu shift register circuit output control signal;

The shift register circuit shifts the third data according to the control signal, obtain it is described at least one the Two data.

11. according to the method described in claim 10, it is characterized in that, the control signal is k clock signal, the displacement Register circuit shifts the third data according to the control signal, obtains at least one described second data, comprising:

Third data described in the shift register circuit Serial output, and according to the k after exporting the third data Clock signal Serial output k 0, obtain second data, wherein k is the value of the integer part of first data, k For positive integer.

12. according to the method described in claim 10, it is characterized in that, the shift register circuit is according to the control signal pair The third data are shifted, at least one described second data are obtained, comprising:

The shift register circuit moves to left k to the third data according to the control signal, obtains second data, In, k is the value of the integer part of first data, and k is positive integer.

13. according to the method for claim 12, which is characterized in that the control signal is k clock signal.

14. according to the method for claim 12, which is characterized in that the control signal is j clock signal, wherein

The binary number of i-th clock signal and the integer part in the j clock signal by low to high I-th bit is corresponding, and j is the digit of the binary number of the integer part, and i, j are positive integer, 1≤i≤j;

The shift register circuit moves to left k to the third data according to the control signal, obtains second data, packet It includes:

When the i-th bit of the binary number is non-zero, the shift register circuit is according to i-th of clock signal, to described Third data move to left 2i-1Position;

When the i-th bit of the binary number is 0, the shift register circuit is not shifted the third data.

Technical field

This application involves circuit fields, and more particularly to the circuit of the data processing in nerve network system and Method.

Background technique

Neural network and deep learning algorithm have been obtained for extremely successful application, and in the process rapidly developed In, industry is generally expected new calculation and helps to realize more universal, complicated intelligent use.Neural network and depth It practises algorithm etc. and achieves achievement very outstanding in image recognition application field in recent years, therefore industry is to deep learning algorithm Optimization and high efficiency realization begin to focus on and pay attention to, and put into the research of optimum algorithm of multi-layer neural network.

Due to the driving of application, industry generally begins one's study and develops efficient neural network accelerating hardware and chip. Research and practice discovery, for fp32, fp16 format of mature application, the data format of many low precision is in mind It is feasible, including INT8 and other 4,2 or even 1 implementation methods, therefore more precision in network operations Data format has different performances in heterogeneous networks, different application, can meet wanting for application respectively under special scenes It asks.

Therefore, how to be compatible with the data format of a variety of precision in the hardware structure of neural network is urgently to be resolved ask Topic.

Summary of the invention

The application provides the circuit and method of data processing in a kind of nerve network system, can be in the hardware of neural network The data format of compatible a variety of precision in framework.

In a first aspect, providing a kind of circuit of data processing in nerve network system, which includes serial addition electricity Road, the first Nonlinear Mapping circuit, summation circuit and the second Nonlinear Mapping circuit, wherein

The serial addition circuit serially obtains each input data at least one input data and described each defeated Enter the corresponding weight parameter of data, each input data and the corresponding weight parameter of each input data are gone here and there Row add operation obtains at least one first data;

The first Nonlinear Mapping circuit carries out first to the first data of each of at least one first data Nonlinear Mapping obtains at least one second data, wherein first Nonlinear Mapping is with the exponential transform at 2 bottoms;

The summation circuit adds up at least one described second data;

The second Nonlinear Mapping circuit, it is non-linear to the accumulation result progress second of summation circuit output to reflect It penetrates, obtains output data, wherein second Nonlinear Mapping is according to the Nonlinear Mapping of the neural network and described the What one Nonlinear Mapping determined.

Therefore, in the embodiment of the present application, input data and weighting parameter are serially obtained by adder, and to the input Data and weighting parameter carry out serial addition operation and obtain at least one first data, then pass through the first Nonlinear Mapping circuit First Nonlinear Mapping is carried out at least one described first data and obtains at least one second data, then passes through summation circuit To this, at least one second data adds up, then is added up by the second Nonlinear Mapping circuit to the summation circuit output As a result the second Nonlinear Mapping is carried out, based on this embodiment of the present application by by the input and calculating process of the operand of adder Serialization, can be realized and the compatible processing of more precision is transferred in the counting of timeticks, therefore it is simultaneous that more precision may be implemented The neural network computing framework of appearance.

In a kind of concrete implementation mode, since the calculating of neural network is that classification carries out, after the output of previous stage is The input of level-one, therefore input data progress logarithmic transformation can be realized by the second Nonlinear Mapping circuit of previous stage, That is, the input terminal that is cumulative and sending computing unit back to again after by the second Nonlinear Mapping of previous stage, prepares to hold The calculating of row next time.Optionally, weight parameter can be complete with calculated in advance, saves in memory, interim when calculating Get computing unit.

Optionally, the second Nonlinear Mapping be Nonlinear Mapping and first Nonlinear Mapping according to neural network Inverse mapping determine.Specifically, when the first Nonlinear Mapping is the exponential function with 2 bottom of for, the first Nonlinear Mapping Inverse mapping is the logarithmic function for being bottom with 2.

The embodiment of the present application, by taking 8bit as an example, for parallel processing, although the reduced performance of serial process 8 Times, but the Resources Consumption of serial process circuit reduces 8 times than the Resources Consumption of parallel processing circuit, so if by same Resource be used to realize the serial process unit of 8 times of quantity, then overall performance can keep being basically unchanged, therefore the application is real The efficiency of data processing can be guaranteed while realizing that more precision are compatible by applying example.

Optionally, in the embodiment of the present application, cumulative process handled by accumulator and the second Nonlinear Mapping circuit institute The process of the second Nonlinear Mapping carried out requires higher precision in the actual scene of neural computing, therefore can be with It does not need to meet the compatible requirement of more precision, can be carried out at data parallel based on this accumulator and the second Nonlinear Mapping circuit Reason, can also serially carry out data processing.

With reference to first aspect, in some possible implementations of first aspect, further includes:

Serial-parallel conversion circuit, serially obtains at least partly data of each first data, and by described each first At least partly data parallel of data is exported to the first Nonlinear Mapping circuit.

Optionally, which can be the first serial-parallel conversion circuit, the serial data for obtaining adder and exporting, And the data parallel is exported to the first Nonlinear Mapping circuit so that by the first Nonlinear Mapping circuit to the data into The first Nonlinear Mapping of row.

Optionally, the first Nonlinear Mapping circuit can carry out serial process to data.Specifically, the of integer part The process of one Nonlinear Mapping can be serialized processing, and the process of the first Nonlinear Mapping of fractional part cannot be serial Change processing.In one possible implementation, which can be the second serial-parallel conversion circuit, and serial obtain adds The fractional part of the data of musical instruments used in a Buddhist or Taoist mass output, and by the fractional part parallel output to the first Nonlinear Mapping circuit, so as to pass through First Nonlinear Mapping circuit carries out the first Nonlinear Mapping to the fractional part.

With reference to first aspect, in some possible implementations of first aspect, the first Nonlinear Mapping circuit Including Nonlinear Mapping unit, shift control circuit and shift register circuit, wherein

The Nonlinear Mapping unit obtains the fractional part of each first data, to described each first parallel The fractional part of data carries out first Nonlinear Mapping, obtains third data, and the third data are exported to described Shift register circuit;

The shift control circuit obtains the integer part of each first data, and according to each first number According to integer part, Xiang Suoshu shift register circuit output control signal;

The shift register circuit shifts the third data according to the control signal, and acquisition is described at least One the second data.

With reference to first aspect, in some possible implementations of first aspect, the control signal is k clock letter Number, third data described in the shift register circuit Serial output, and when after exporting the third data according to described k Clock signal Serial output k 0, obtain second data, wherein k is the value of the integer part of first data, and k is Positive integer.Shifting function can be made simpler in this way, advantageously reduce power consumption.

With reference to first aspect, in some possible implementations of first aspect, the shift register circuit is according to institute It states control signal and k is moved to left to the third data, obtain second data, wherein k is the integer portion of first data The value divided, k is positive integer.

With reference to first aspect, in some possible implementations of first aspect, the control signal is k clock letter Number.

With reference to first aspect, in some possible implementations of first aspect, the control signal is j clock letter Number, wherein i-th of clock signal in the j clock signal is supreme by low level with the binary number of the integer part The i-th bit of position is corresponding, and j is the digit of the binary number of the integer part, and i, j are positive integer, 1≤i≤j;

Wherein, when the i-th bit of the binary number is non-zero, the shift register circuit is believed according to i-th of clock Number, 2 are moved to left to the third datai-1Position;

When the i-th bit of the binary number is 0, the shift register circuit is not shifted the third data.

In this way, it is possible to reduce the number of displacement advantageously reduces system power dissipation, improves system performance.

Optionally, in the embodiment of the present application, further includes: precision Configuration Control Unit, control serial adder, serioparallel exchange are posted The clock and data routing of at least one of storage, shift controller and shift register.In this way, precision configuration control Device by least one of control serial adder, serioparallel exchange register, shift controller and shift register when Clock and data routing can control in serial adder, serioparallel exchange register, shift controller and shift register The bit wide of the data of at least one processing, i.e. precision.

Second aspect, provides a kind of method of data processing in nerve network system, and this method is using first aspect Any circuit handles input data, this method comprises:

Each input data at least one input data and described every is obtained by the serial addition circuit serial The corresponding weight parameter of a input data, to each input data and the corresponding weight parameter of each input data into Row serial addition operation obtains at least one first data;

The first data of each of at least one first data are carried out by the first Nonlinear Mapping circuit First Nonlinear Mapping obtains at least one second data, wherein first Nonlinear Mapping is with the exponential transform at 2 bottoms;

It is added up by the summation circuit at least one described second data;

It is non-linear that second is carried out to the accumulation result that the summation circuit exports by the second Nonlinear Mapping circuit Mapping obtains output data, wherein second Nonlinear Mapping is according to the Nonlinear Mapping of the neural network and described What the first Nonlinear Mapping determined.

Each step of the method for second aspect is referred to the corresponding module of the circuit of the data processing of first aspect Each operation, is not repeated herein.

Detailed description of the invention

Fig. 1 shows the schematic diagram of the calculating model of neural networks of n grades a kind of (layer).

Fig. 2 shows the schematic diagrames of the circuit of data processing in traditional neural network.

Fig. 3 shows the schematic diagram of the circuit of data processing in a kind of neural network provided by the embodiments of the present application.

Fig. 4 shows the schematic diagram of the circuit of data processing in a kind of neural network provided by the embodiments of the present application.

Fig. 5 shows a kind of schematic diagram of specific shift register of the embodiment of the present application.

Fig. 6 shows the schematic stream of the method for data processing in a kind of nerve network system provided by the embodiments of the present application Cheng Tu.

Specific embodiment

Below in conjunction with attached drawing, the technical solution in the application is described.

Fig. 1 shows the schematic diagram of the calculating model of neural networks of n grades a kind of (layer), Processing with Neural Network one of those The corresponding formula of the calculating of neuron is as follows:

Y=f (x1*w1+x2*w2+ ...+xn*wn+b),

Wherein, xi is data input, wiIt is weight parameter, i=1,2 ..., n, b is a constant, and f () is one specific Function, i and n are positive integer.Above-mentioned formula is to indicate the dot product of two vectors (x1, x2 ... xn) and (w1, w2 ... wn).This Kind operation can gradually complete multiplication on a computing unit, and cumulative.

Fig. 2 is the schematic diagram of the circuit of data processing in traditional neural network.The calculating of neural network is that classification carries out , the output of previous stage is the input of rear stage.Specifically, prime output is as data input (x1, x2 ... xn), by hardware Multiplier 201 by x1, x2 ... xn respectively with corresponding weight parameter be multiplied, then by accumulator 202 complete x1*w1+x2*w2 + ... the accumulation operations of+xn*wn+b, it is (tired carrying out Nonlinear Mapping y=f to accumulation result by Nonlinear Mapping unit 203 Result after adding) calculated result is obtained, finally complete the output of data.

But the circuit in Fig. 2 is needed using hardware multiplier 201, which increase the complexity of computing unit and occupancy Resource.For short-cut multiplication operation, complicated multiplying can be converted to by add operation by logarithmic transformation.Fig. 3 shows The schematic diagram of the circuit of data processing in a kind of neural network provided by the embodiments of the present application is gone out.

The working principle in Fig. 3 is illustrated below.Assuming that data A and data B will carry out multiplying, it can be to A It first carries out logarithmic transformation with B to obtain: a=log2A, b=log2B.Here, suppose that A is data, B is weight parameter, then carries out to A Logarithmic transformation can be realized by the second Nonlinear Mapping circuit of previous stage, that is to say, that previous stage cumulative and passed through The input terminal for sending computing unit after second Nonlinear Mapping back to again, is ready to carry out calculating next time, and weight parameter B can be with Calculated in advance is complete, saves in memory, temporarily gets computing unit when calculating.

Then, add operation: c=a+b is executed to a and b by adder 301.

Then, the first Nonlinear Mapping is carried out to c by the first Nonlinear Mapping circuit 302.Here, first non-linear reflects Penetrating is exponential function with 2 bottom of for, it may be assumed that d=2c.It is found that d=A*B at this time.

Then, the multiple similar product accumulations that will be obtained before this by accumulator 303 have just obtained the tired of these products Adduction.

Then, cumulative and the second Nonlinear Mapping of progress accumulator exported by the second Nonlinear Mapping circuit 304. Also, it is sent back to the input terminal of computing unit again by data after the second Nonlinear Mapping, is ready to carry out calculating next time.

Here, the second Nonlinear Mapping be Nonlinear Mapping and first Nonlinear Mapping according to neural network What inverse mapping determined.Specifically, when the first Nonlinear Mapping is the exponential function with 2 bottom of for, the first Nonlinear Mapping it is inverse It is mapped as the logarithmic function with 2 bottom of for.

A kind of scheme can carry out parallel processing to data during data processing in neural network, that is to say, that mind Unit through each data processing used by data processing in network can be with parallel read data, processing data, parallel output Data.But the implementation of any parallelization will realize that the compatibility of more precision is all relatively difficult.For example, one 8 simultaneously Row processing unit, if to handle 4 data, that is, 8 are worked as 4 use, can be real in the case where wasting half resource The operation of existing 4 precision.When processing unit reads data from memory, needs circuit and realize 2 kinds of reading sides respectively Formula, one is 4, one is 8.Moreover, 4 data are with respect to 8 digits in the case where meeting the same memory bandwidth According to 2 times of readout process speed of needs, therefore also needs to double processing unit at this time and 4 data are handled.Therefore, logarithm More precision compatibilities are difficult to realize according to parallel processing is carried out.

In the embodiment of the present application, serial process can be carried out to data during the data processing in neural network. For example, the data for 8 are not the reading of byte one at a time, but read, handle by turn by turn, may be implemented at this time The compatibility of more precision.Specifically, the reading and processing of 8 data need 8 timeticks, the reading and processing of 4 data Need 4 timeticks, other data widths similarly, can be realized elastic, more by one based on this embodiment of the present application The demand of change is transformed into time-domain to realize.Since the time beat of needs can be controlled flexibly, and it is specific parallel hard Part circuit is fixed, therefore the embodiment of the present application can flexibly realize the compatibility of more precision.

In addition, by taking 8bit as an example, for parallel processing, although 8 times of the reduced performance of serial process, string The Resources Consumption of row processing circuit reduces 8 times than the Resources Consumption of parallel processing circuit, so if same resource is used Realize the serial process unit of 8 times of quantity, then overall performance can keep being basically unchanged, therefore the embodiment of the present application can While realizing that more precision are compatible, guarantee the efficiency of data processing.

It should be noted that by 304 institute of cumulative process handled by accumulator 303 in Fig. 3 and the second Nonlinear Mapping circuit The process of the second Nonlinear Mapping carried out requires higher precision in the actual scene of neural computing, therefore can be with It does not need to meet the compatible requirement of more precision, can be carried out parallel based on this accumulator 303 and the second Nonlinear Mapping circuit 304 Data processing, can also serially carry out data processing, and the embodiment of the present application is not construed as limiting this.

In the embodiment of the present application, adder 301 is needed to carry out serial process to data, i.e. adder 301 can be serial Adder, or it is referred to as serial full adder.Specifically, serial adder is used from low level to high-order addition by turn Mode, the same one-bit full addres of time-sharing multiplex.

It can be not required to if the serial input of adder 301 and rearward part matches as an optional embodiment Any register (Reg) is wanted to cache the serial data of output, and is directly input to the data that the adder 301 exports below Component in.

As an optional embodiment, if the subsequent component being connect with adder 301 can not serial input when, The calculated result of each of the adder 301 can be saved in register, then from the register and be about to the meter Result is calculated to export into subsequent component.

In the embodiment of the present application, first is carried out to the data that adder 301 exports by the first Nonlinear Mapping circuit 302 Nonlinear Mapping, the first Nonlinear Mapping are the exponential transforms with 2 bottom of for, it may be assumed that y=2x.As an optional embodiment, In The data of 302 outputs include integer part and fractional part.For integer part, the first Nonlinear Mapping is equivalent to displacement Operation, for fractional part, the first Nonlinear Mapping can be by a small-scale look-up table or mapping table come evaluation.

As an optional embodiment, the first Nonlinear Mapping circuit 302 can carry out parallel processing to data.This When, it further include the first serial-parallel conversion circuit in the circuit of the data processing in the embodiment of the present application, serial to obtain adder 301 defeated Data out, and the data parallel is exported to the first Nonlinear Mapping circuit, so that passing through the first Nonlinear Mapping circuit First Nonlinear Mapping is carried out to the data.

As an optional embodiment, the first Nonlinear Mapping circuit 302 can carry out serial process to data.Specifically For, the process of the first Nonlinear Mapping of integer part can be serialized processing, and the first of fractional part non-linear reflects The process penetrated cannot be serialized processing.In one possible implementation, the electricity of the data processing in the embodiment of the present application It further include the second serial-parallel conversion circuit in road, the serial fractional part for obtaining the data that adder 301 exports, and by the fractional part Divide parallel output to the first Nonlinear Mapping circuit, so as to carry out first to the fractional part by the first Nonlinear Mapping circuit Nonlinear Mapping.

It should be noted that in the embodiment of the present application, although the process of the first Nonlinear Mapping of fractional part cannot be serialized Processing, but the cost of resource occupation can receive during its parallelization is handled.Under practical application scene, it usually needs The preferential digit for guaranteeing integer, to obtain sufficiently large value range.As a specific example, when the precision of needs is When 1bit to 3bit, integer-bit is usually only needed, when the precision of needs is 4bit, there can be a decimal place, or do not have Decimal place (i.e. 4 are all integer-bits) can have 1-4 decimal place when the precision of needs is 4bit to 8bit, remaining is whole Numerical digit.

Then, it is added up, then led to by least one data of accumulator 303 to the first Nonlinear Mapping circuit output It crosses the second Nonlinear Mapping circuit 304 and second Nonlinear Mapping is carried out to the accumulation result that accumulator 303 exports.

It should be noted that adder 301 could alternatively be the function that can execute above-mentioned adder 301 in the embodiment of the present application Add circuit, accumulator 303 could alternatively be the summation circuit that can execute the function of above-mentioned accumulator 303, and the application is real It applies example and this is not especially limited.

Therefore, the embodiment of the present application is by can be realized the input of the operand of adder and calculating process serialization The compatible processing of more precision is transferred in the counting of timeticks, it is compatible that more precision may be implemented based on this embodiment of the present application Neural network computing circuit.

Fig. 4 shows the schematic diagram of the circuit of data processing in a kind of neural network provided by the embodiments of the present application.Ying Li Solution,

Fig. 4 shows the exemplary module or unit of the circuit of data processing, but these modules or unit are only examples, this Apply for that embodiment can also include the deformation of other modules or modules or unit in unit or Fig. 4.In addition, in Fig. 4 Example just for the sake of helping skilled in the art to understand and realizing the embodiment of the present application, rather than limitation the embodiment of the present application Range.Those skilled in the art can carry out equivalence transformation or modification according to example given here, such to convert or repair Changing still should fall into the range of the embodiment of the present application.

In the embodiment of the present application, the circuit of the data processing includes serial adder 401, precision Configuration Control Unit 402, string And translation register 403, Nonlinear Mapping unit 404, shift controller 405, shift register 406, accumulator 407 and second Nonlinear Mapping circuit 408.Wherein, Nonlinear Mapping unit 404, shift controller 405, shift register 406 can be the The component part of one Nonlinear Mapping circuit 302.In an optional embodiment, serioparallel exchange register 403 can be first The component part of Nonlinear Mapping circuit 302, the embodiment of the present application are not construed as limiting this.In another optional embodiment, move Level controller 405 can be the local controlled for being controlled shift register 406 for including in precision Configuration Control Unit 402 Circuit processed, the embodiment of the present application are not construed as limiting this.

In the embodiment of the present application, precision Configuration Control Unit 402 can control serial adder 401, serioparallel exchange register 403, the clock and data routing of at least one of shift controller 405 and shift register 406.In this way, precision is matched It sets controller 402 and passes through control serial adder 401, serioparallel exchange register 403, shift controller 405 and shift LD The clock and data routing of at least one of device 406, can control serial adder 401, serioparallel exchange register 403, The bit wide of the data of at least one of shift controller 405 and shift register 406 processing, i.e. precision.

It should be understood that may include corresponding to control serial adder 401, serioparallel exchange register in precision Configuration Control Unit 403, the local control circuit of shift controller 405 or shift register 406.In this way, by corresponding to control serial adder 401, the local control circuit of serioparallel exchange register 403, shift controller 405 or shift register 406, may be implemented to string Row adder 401, serioparallel exchange register 403, shift controller 405 or shift register 406 are respectively controlled, i.e. clock And data routing.For example, may be implemented by local control circuit corresponding with serial adder 401 to serial addition The control of device 401.In another example may be implemented by local control circuit corresponding with shift register 406 to shift register 406 control.

Serial adder 401 serially obtains each input data and each input at least one input data The corresponding weight parameter of data carries out each input data and the corresponding weight parameter of each input data serial Add operation obtains at least one first data.

Here, input data is, for example, data A above, and the corresponding weighting parameter of each input data is for example, above In weight parameter B.In the embodiment of the present application, a bit of input data and weighting parameter can be known as operand, had A bit in input data can be known as serial operation number 1 for body, it will be in the corresponding weight parameter of input data One bit is known as serial operation number 2.

Specifically, the synchronous reading from memory with serial operation number 2 of serial operation number 1, respectively reads in each clock cycle 1 out, the input as serial adder 401.In this way, precision Configuration Control Unit 402, which can control, is passing through M clock cycle Two operands reading that precision is M can be finished afterwards, wherein M is positive integer, and as an example, M can be 8.Here, precision The actually bit wide of data.In addition, the preserving type of operand in memory needs to be suitble to serial reading manner, example Such as, the byte saved on each address of memory respectively takes a bit to form by 8 operands, so every time from memory The byte of middle reading can correspond respectively to a bit in 8 operands.

Then, serial operation number 1 and serial operation number 2 are sent to the add operation that serial adder 401 completes a bit, Specifically, each bit of input data and weighting parameter sequentially enters serial adder 401, by M clock cycle it The add operation that can complete whole bit wides of input data and weight parameter afterwards, obtains the sum of input data and weight parameter, I.e. above-mentioned first data.

Specifically, for single full adder, it is assumed that n-th of operand of input data is denoted as An, weight parameter N-th of operand is denoted as Bn, the carry digit from low level is denoted as Cn, and it is denoted as Yn, the carry digit of Xiang Gaowei is denoted as Cn+1.Here, n For positive integer, and it is less than or equal to the bit wide of input data or weight parameter.

It before starting calculating, needs to initialize serial adder 401, that is, needs first to save posting for carry digit Storage is reset.Optionally, the initialization of serial adder 401 can be completed by precision Configuration Control Unit 402.Specifically, smart Degree Configuration Control Unit 402 can be used to serial adder 401 save input operand and the register of low order carry position is reset. Then, the present bit of two operands is added, along with the carry digit from low level position, available current data output and To high-order carry-out.Here it is possible to using (Cn+1,Yn)=(An,Bn,Cn) form, to indicate its operation law:

(0,0)=(0,0,0)

(0,1)=(0,1,0)

(0,1)=(1,0,0)

(1,0)=(1,1,0)

(0,1)=(0,0,1)

(1,0)=(0,1,1)

(1,0)=(1,0,1)

(1,1)=(1,1,1)

At this moment, the carry digit of Xiang Gaowei can be stored in cache register, for carrying out add operation next time. Meanwhile when current data output is decimal place, current data can be exported and be saved in decimal place serioparallel exchange register 403 In.

Then, the value of n can add 1, carry out the add operation of next bit, until input data and weighting parameter is all Position all calculates and finishes.As an example, if input data and the bit wide of weighting parameter are 8, after 8 clock cycle, Calculating process terminates.

In the embodiment of the present application, the result that serial adder 401 calculates can be exported serially by turn.Assuming that serial operation number It is to be input to serial adder 401 according to (i.e. first fractional part, rear integer part) from right to left, exports at this time also according to from the right side To left sequence, first export the fractional part of sum, the i.e. fractional part of the first data, afterwards the integer part of output sum, i.e., first The integer part of data.

Serioparallel exchange register 403, the serial fractional part for obtaining the first data, and by the fractional part parallel output To Nonlinear Mapping unit 404.Specifically, fractional part can according to from right to left be sequentially input to serioparallel exchange register 403.Specifically, precision Configuration Control Unit 402 can control so that after n clock cycle if fractional part is n, All decimal places are all saved in serioparallel exchange register 403, and after this n clock cycle, can be somebody's turn to do with parallel output Fractional part.

Nonlinear Mapping unit 404, the parallel fractional part for obtaining the first data that serioparallel exchange register 403 exports, First Nonlinear Mapping is carried out to the fractional parts of first data, obtains third data, and by the third data It exports to shift register 406.Specifically, the first Nonlinear Mapping may refer to it is described above, to avoid repeating, here It repeats no more.

Shift controller 405 obtains the integer part of the first data, and according to the integer portion of each first data Point, control signal is exported to shift register 406.Here, shift controller 405 can also be known as the control of integer floating-point shift Device.

Specifically, the value of the integer part of first data is k, k is positive integer, and at this moment the control signal can be k A clock signal.Alternatively, the binary number representation of the integer part of first data is j, j is positive integer, at this time the control Signal can be j clock signal.

The shift register 406, according to the control signal of the shift controller 405 output to the third of acquisition Data are shifted, and the second data are obtained.Here, shift register can also be known as decimal bit shift register.

A kind of possible implementation, when controlling signal is k clock signal, shift register 406 can be serial defeated The third data out, and completed according to k clock signal Serial output k 0 to third number after exporting the third data According to displacement.Shifting function can be made simpler in this way, advantageously reduce power consumption.

Alternatively possible implementation, shift register 406 can move to left k to the third data, to obtain second Data, then can be with parallel output second data.

Specifically, when control signal be k clock signal when, shift register 406 can according to the k clock signal, First data are moved to left k times, move one every time to complete the displacement to third data.

When controlling signal is j clock signal, shift register 406 can according to the j clock signal, to this first Data move to left j times, complete the displacement to third data.Specifically, each clock signal in the j clock signal and this two Each correspondence of system number.In this case, each carry digit is different, can be 1,2,4 or 8 etc. 2 Power.That is, needing to shift third data if a certain position of the binary number is non-zero, and carry digit is equal to this The weight of position.If a certain position is 0, which is not shifted.In this way, it is possible to reduce the number of displacement, favorably In reducing system power dissipation, system performance is improved.

In a kind of concrete implementation mode, i-th of clock signal in the j clock signal can be with the binary number I-th bit from low to high is corresponding, and i is positive integer, 1≤i≤j.At this point, displacement is posted when the i-th bit of the binary number is non-zero Storage 406 moves to left 2 according to i-th of clock signal, to the third datai-1Position;When the i-th bit of the binary number is 0, The shift register circuit does not shift the third data.

In specific implementation, which may include multiple triggers, the word length size of multiple trigger The size of data enough after covering nonlinear transformation.Here, the input of each trigger is no longer only straight with previous trigger Connect it is connected, but with the output of a multiple selector (multiplexer, MUX) be connected, and the input of the MUX come from the touching Send out the 2nd before deviceuA trigger, u is the integer more than or equal to 0 here.

Fig. 5 shows a kind of schematic diagram of specific shift register of the embodiment of the present application.The shift register is 16 Bit shift register, including 16 triggers, respectively D0, D1 ... D14, D15.Wherein, the input of each trigger, which comes from, is somebody's turn to do The output of the 1st, the 2nd, the 4th or the 8th trigger before trigger.It, can be with for example, the 16th trigger D15 be by MUX1 The output from one of D14, D13, D11, D7 is inputted, and then realizes 1,2,4 or 8 data displacement.For another example, 15th trigger D14 can input the output from one of D13, D12, D10, D6 by MUX2, so realize 1,2, 4 or 8 data displacements.For another example, the 6th trigger D5 can be inputted by MUX3 from one of D4, D3, D1 Output, and then realize 1,2 or 4 data displacement, at this moment, an input terminal of MUX2 includes 0.Other everybody companies The mode of connecing can with and so on, for low level trigger, such as D0, D1, D2 etc., the input terminal of their MUX may include 0.

Specifically, the switching control of MUX can be controlled by above-mentioned j clock signal.As a specific example, when this When the binary number representation of integer part is 1011, shift controller 405 can serially obtain each bit of the integer part Position, such as shift controller 405 can be obtained successively by the 1 of low to high, 1,0,1 four bit.

It, can be defeated to shift register 406 when shift controller 405 obtains primary " 1 " by low to high 1st clock signal out, the clock signal can be the clock signal of high level, and instruction shift register moves to left third data 1, the 1st clock signal can control the MUX of trigger D1 at this time, can be using D0 as the input of D1.

It, can be defeated to shift register 406 when shift controller 405 obtains deputy " 1 " by low to high 2nd clock signal out, the clock signal can be the clock signal of high level, and instruction shift register moves to left third data 2, the 2nd clock signal can control the MUX of trigger D3 at this time, can be using D1 as the input of D3.

It, can be defeated to shift register 406 when shift controller 405 obtains " 0 " by the third position of low to high 3rd clock signal out, the clock signal can be low level clock signal, indicate shift register not to the third data It is shifted.

It, can be defeated to shift register 406 when shift controller 405 obtains " 1 " by the 4th of low to high 4th clock signal out, the clock signal can be the clock signal of high level, and instruction shift register moves to left third data 8, the 4th clock signal can control the MUX of trigger D11 at this time, can be using D3 as the input of D11.At this point, by To the traversal of all nonzero digits, it can be realized and 11 positions (i.e. 8+2+1) of third data progress are moved to left, obtain the second data.

When shift register 406 completes the displacement to third data, precision Configuration Control Unit 402 can be to shift LD Device 406 sends enable signal, so that shift register 406 exports the result (i.e. the second data) after displacement to accumulator 407, Here the output is cumulative item.

The cumulative item that accumulator 407 exports shift register 406 is added up one by one.Optionally, the embodiment of the present application In, accumulator 407, which can according to need, to be controlled as not adding up, such as when cumulative item is 0, keeps last cumulative knot Fruit is constant.Alternatively, accumulator 407 can also be controlled as subtraction as needed, such as when cumulative item is negative, it can Last accumulation result and current cumulative item are carried out subtraction.

The result that second Nonlinear Mapping circuit 408 exports accumulator 407 does the second Nonlinear Mapping.In general, second Nonlinear Mapping can be according to the characteristics of neural network and the type of the nonlinear transformation of prime (i.e. the first Nonlinear Mapping) To customize.It illustrates as one, which may include the cascade of two kinds of nonlinear transformations, wherein when first When Nonlinear Mapping is exponential transform, the first Nonlinear Mapping here is the original nonlinear transformation of neural network, such as Sigmoid, ReLU etc., another Nonlinear Mapping are the inverse operations of the first Nonlinear Mapping, i.e. the another kind Nonlinear Mapping For logarithmic transformation.

In addition, the data of the second Nonlinear Mapping circuit 408 output can pass through parallel-serial conversion in the embodiment of the present application, Memory is sent back to again with series form, to execute calculating next time.

It should be noted that any one device in above-mentioned device 401 to 408 could alternatively be can be in the embodiment of the present application The circuit module or circuit unit of the function of the device are executed, the embodiment of the present application is not especially limited this.

Therefore, in the embodiment of the present application, input data and weighting parameter are serially obtained by adder, and to the input Data and weighting parameter carry out serial addition operation and obtain at least one first data, then pass through the first Nonlinear Mapping circuit First Nonlinear Mapping is carried out at least one described first data and obtains at least one second data, then passes through summation circuit To this, at least one second data adds up, then is added up by the second Nonlinear Mapping circuit to the summation circuit output As a result the second Nonlinear Mapping is carried out, based on this embodiment of the present application by by the input and calculating process of the operand of adder Serialization, can be realized and the compatible processing of more precision is transferred in the counting of timeticks, therefore it is simultaneous that more precision may be implemented The neural network computing framework of appearance.

The circuit of data processing can be applied to cloud server and answer in nerve network system provided by the embodiments of the present application , can be with the independent processing chip in terminal (such as mobile phone) application with occasion, one be also possible in terminal handler chip Module (template such as realized based on ASIC).Specifically, the input of the information of the circuit can come from voice, image, from Right language etc. needs the various information inputs of Intelligent treatment, and (can such as sample, analog-to-digital conversion is special by necessary pretreatment Sign is extracted etc.) form the data of neural network computing to be processed.In addition, the information of the circuit can be output to other subsequent places Manage module or software, for example, figure or other be understood that available manifestation mode.Wherein, it applies under form beyond the clouds, nerve The processing unit of the front stage of the circuit of data in network system processing can for example be undertaken by other server operation units, In Under terminal applies environment, in nerve network system the front stage processing unit of the circuit of data processing can by terminal software and hardware its He completes part (such as including sensor, interface circuit), and the embodiment of the present application is not construed as limiting this.

Below by taking the operand of 8 precision as an example, the data processing in the neural network of the embodiment of the present application is illustrated Example.

Step 1, two operand A and B are read from memory by turn simultaneously, wherein A=0111.0101, B= 0011.1001, data A and B have 4 integer parts and 4 fractional parts.1 is read every time, by sequence from right to left, It is successively 10101110 and 10011100, data A and B can all be read out after 8 clock cycle.Here, it can be realized string The premise that row is read is that data are originally serially stored in memory, for example, first of A and B be all stored in it is same In byte, memory presses byte-accessed, and in this way when this byte is read from memory, first position of A and B are just read simultaneously Out.

Step 2, dual serial operand (two operands read in step 1) are sent to serial adder and complete one The add operation of bit.The initialization of serial adder is completed by precision Configuration Control Unit, and precision Configuration Control Unit is by full adder It is reset for saving the register of low order carry position, indicates there is no carry initially.When starting operation, serial adder first does first The addition 1+1=0 of a bit, carry value 1;Then the addition 0+0=0 for doing second bit, in conjunction with from first bit Carry value, so output result be 1, carry value 0;The rest may be inferred, completes the add operation of all 8 data, serial defeated Out the result is that 01110101, indicate 1010.1110.

The result 1010.1110 that step 3, serial adder execute operation is divided into two parts by output sequence, defeated in advance Out be fractional part 1110, what is then exported is integer part 1010.

Step 4, fractional part (1110) enter decimal place serioparallel exchange register by turn, then parallel output fractional part Point.Here, the process for sealing in and going out is controlled by precision Configuration Control Unit, since decimal place has 4, precision Configuration Control Unit Can control serioparallel exchange register can be by 4 decimal place parallel outputs after inputting 4 clocks.

Step 5, integer part (1010) enter integer floating-point shift controller, control fractional shift by the integer value of input The carry digit of Reg, 1010 corresponding carry digits are 8+2=10.The bit wide of integer floating-point shift controller is configured by precision and is controlled Device provides, and integer floating-point shift controller also is used to control the beginning and end of displacement.

Step 6, the content of decimal place serioparallel exchange register are input to non-linear conversion unit 1, by nonlinear transformation After be output to fractional shift Reg.The nonlinear transformation can be an exponential transform, such as transformed the result is that 1111, Specific implementation can be completed using look-up table or combinational logic circuit.

Step 7, the data 1111 exported from non-linear conversion unit 1, are input to fractional shift Reg.According to integer floating-point The value 10 of shift controller, control fractional shift Reg are shifted.

Step 8, displacement the result is that 11110000000000 (having moved to left 10 compared to 1111), precision after the completion of displacement Configuration Control Unit exports enable signal, and the result after displacement is exported, this output is cumulative item.

Step 9, accumulator add up one by one to the cumulative item of output.

Step 10, non-linear conversion unit 2 do another nonlinear transformation to the result from accumulator.The transformation is desirable The cascade of two kinds of transformation, the first is neural network original nonlinear transformation, such as Sigmoid, ReLU etc., and another kind is pair Transformation of variables (because the non-linear conversion unit 1 of front is to do exponential transform, its inverse transformation is logarithmic transformation).

Step 11, the output data of non-linear conversion unit 2 pass through parallel-serial conversion, send storage back to again with series form Device.

Below again by taking the operand of 3 precision as an example, illustrate at the data in the neural network of the embodiment of the present application The example of reason.

Step 1, two operand A and B are read from memory by turn simultaneously, and wherein A=11.0, B=01.1, there is 2 Position integer part and 1 fractional part.1 is read every time, is successively 011 and 110,3 clock weeks by sequence from right to left It is all read out after phase.

Step 2, dual serial operand (two operands read in step 1) are sent to serial adder and complete one The add operation of bit.When starting operation, the addition 1+0=1 of first bit, carry value 0 are first done;Then second ratio is done Special addition 1+1=0, in conjunction with the carry value from first bit, so output result is 1, carry value 1.The rest may be inferred, Complete the add operation of all 3 data, Serial output the result is that 1001, indicate 100.1.

The result 100.1 that step 3, serial adder execute operation is divided into two parts by output sequence, exports in advance It is fractional part 1, what is then exported is integer part 100.

Step 4, fractional part (1) enter decimal place serioparallel exchange register, then parallel output fractional part.Here, It seals in and the process gone out is controlled by precision Configuration Control Unit, since decimal place has 1, precision Configuration Control Unit be can control Serioparallel exchange register can be by 1 decimal place parallel output after inputting 1 clock.

Step 5, integer part (100) enter integer floating-point shift controller, control fractional shift by the integer value of input The carry digit of Reg, 100 corresponding carry digits are 4.

Step 6, the content of decimal place serioparallel exchange register are input to non-linear conversion unit 1, by nonlinear transformation After be output to fractional shift Reg.The nonlinear transformation can be an exponential transform, such as transformed the result is that 1.

Step 7, the data 1 exported from non-linear conversion unit 1, are input to fractional shift Reg.According to integer floating-point shift The value 4 of controller, control fractional shift Reg are shifted.

Step 8, displacement the result is that 10000 (moving 4 compared to 1), precision Configuration Control Unit is defeated after the completion of displacement Enable signal out exports the result after displacement, this output is cumulative item.

Step 9, accumulator add up one by one to the cumulative item of output.

Step 10, non-linear conversion unit 2 do another nonlinear transformation to the result from accumulator.The transformation is desirable The cascade of two kinds of transformation, the first is neural network original nonlinear transformation, such as Sigmoid, ReLU etc., and another kind is pair Transformation of variables (because the non-linear conversion unit 1 of front is to do exponential transform, its inverse transformation is logarithmic transformation).

Step 11, the output data of non-linear conversion unit 2 pass through parallel-serial conversion, send storage back to again with series form Device.

Therefore, the embodiment of the present application is by can be realized the input of the operand of adder and calculating process serialization The compatible processing of more precision is transferred in the counting of timeticks, it is compatible that more precision may be implemented based on this embodiment of the present application Neural network computing framework.

Fig. 6 shows the schematic stream of the method for data processing in a kind of nerve network system provided by the embodiments of the present application Cheng Tu.This method is executed by data processing circuit, which may include serial addition circuit, first non-linear reflects Transmit-receive radio road, summation circuit and the second Nonlinear Mapping circuit, this method comprises:

610, each input data and the institute at least one input data are obtained by the serial addition circuit serial The corresponding weight parameter of each input data is stated, each input data and the corresponding weight of each input data are joined Number carries out serial addition operation and obtains at least one first data;

620, by the first Nonlinear Mapping circuit to the first data of each of at least one first data The first Nonlinear Mapping is carried out, at least one second data is obtained, wherein first Nonlinear Mapping is with the index at 2 bottoms Transformation;

630, it is added up by the summation circuit at least one described second data;

640, it is non-that second is carried out to the accumulation result that the summation circuit exports by the second Nonlinear Mapping circuit Linear Mapping obtains output data, wherein second Nonlinear Mapping be according to the Nonlinear Mapping of the neural network and What first Nonlinear Mapping determined.

Optionally, the data processing circuit further includes serial-parallel conversion circuit, the method also includes:

By at least partly data for serially obtaining each first data, and by each first data At least partly data parallel is exported to the first Nonlinear Mapping circuit.

Optionally, the first Nonlinear Mapping circuit includes that Nonlinear Mapping unit, shift control circuit and displacement are posted Deposit circuit, wherein it is described by the first Nonlinear Mapping circuit to each of at least one described first data first Data carry out the first Nonlinear Mapping, obtain at least one second data, comprising:

The fractional part for obtaining each first data parallel by the Nonlinear Mapping unit, to described each The fractional part of one data carries out first Nonlinear Mapping, obtains third data, and the third data are exported to institute State shift register circuit;

The integer part of each first data is obtained by the shift control circuit, and according to described each first The integer part of data, Xiang Suoshu shift register circuit output control signal;

The shift register circuit shifts the third data according to the control signal, obtains described at least one A second data.

Optionally, the control signal is k clock signal, and the shift register circuit is according to the control signal to institute It states third data to be shifted, obtains at least one described second data, comprising:

Third data described in the shift register circuit Serial output, and after exporting the third data according to K clock signal Serial output k 0, obtain second data, wherein k is taking for the integer part of first data Value, k is positive integer.

Optionally, the shift register circuit shifts the third data according to the control signal, obtains institute State at least one second data, comprising:

The shift register circuit moves to left k to the third data according to the control signal, obtains second number According to, wherein k is the value of the integer part of first data, and k is positive integer.

Optionally, the control signal is k clock signal.

Optionally, the control signal is j clock signal, wherein

I-th of clock signal in the j clock signal is supreme by low level with the binary number of the integer part The i-th bit of position is corresponding, and j is the digit of the binary number of the integer part, and i, j are positive integer, 1≤i≤j;

The shift register circuit moves to left k to the third data according to the control signal, obtains second number According to, comprising:

When the i-th bit of the binary number is non-zero, the shift register circuit is right according to i-th of clock signal The third data move to left 2i-1Position;

When the i-th bit of the binary number is 0, the shift register circuit is not shifted the third data.

Specifically, the data processing circuit is, for example, above shown in Fig. 3 or Fig. 4.That is, the number in Fig. 3 or Fig. 4 It can be realized the corresponding each process of embodiment of the method shown in fig. 6 according to the circuit of processing, specifically, the circuit of the data processing It may refer to described above, to avoid repeating, which is not described herein again.

Therefore, in the embodiment of the present application, input data and weighting parameter are serially obtained by adder, and to the input Data and weighting parameter carry out serial addition operation and obtain at least one first data, then pass through the first Nonlinear Mapping circuit First Nonlinear Mapping is carried out at least one described first data and obtains at least one second data, then passes through summation circuit To this, at least one second data adds up, then is added up by the second Nonlinear Mapping circuit to the summation circuit output As a result the second Nonlinear Mapping is carried out, based on this embodiment of the present application by by the input and calculating process of the operand of adder Serialization, can be realized and the compatible processing of more precision is transferred in the counting of timeticks, therefore it is simultaneous that more precision may be implemented The neural network computing framework of appearance.

It should be understood that magnitude of the sequence numbers of the above procedures are not meant to execute suitable in the various embodiments of the application Sequence it is successive, the execution of each process sequence should be determined by its function and internal logic, the implementation without coping with the embodiment of the present application Process constitutes any restriction.

It should be understood that the first, second equal description occurred in the embodiment of the present application, only make signal with distinguish description object it With, without order point, do not indicate to be particularly limited to equipment number in the embodiment of the present application yet, cannot constitute to the application reality Apply any restrictions of example.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed Scope of the present application.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of the steps. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), arbitrary access are deposited The various media that can store program code such as reservoir (Random Access Memory, RAM), magnetic or disk.

The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any Those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, and should all contain Lid is within the scope of protection of this application.Therefore, the protection scope of the application should be based on the protection scope of the described claims.

21页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:深度学习加速的方法和设备及深度神经网络

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!