Handle the performance that not adjacent memory improves neural network as adjacent memory

文档序号:1760330 发布日期:2019-11-29 浏览:28次 中文

阅读说明:本技术 处理不邻近存储器作为邻近存储器以提高神经网络的性能 (Handle the performance that not adjacent memory improves neural network as adjacent memory ) 是由 G·彼得 C·B·麦克布赖德 A·A·安巴德卡 K·D·塞多拉 B·博布罗夫 L·M·瓦尔 于 2018-04-06 设计创作,主要内容包括:神经网络(NN)的性能可能会受到所执行的操作的数目的限制。使用涉及将存储器块移位所选择的移位步幅以用于协作神经元的线缓冲器,可以像在单个线缓冲器写入周期中那样处理可操作地驻留在存储器中并且需要向协作线缓冲器中的多个写入周期的数据,从而增强了NN/DNN的性能。控制器和/或迭代器可以生成具有用于与线缓冲器通信的存储器块移位值的一个或多个指令。移位值可以使用输入数据的各种特性以及包括数据维度在内的NN/DNN来计算。线缓冲器可以读取数据以进行处理,移位存储器块的数据,并且将数据写入线缓冲器中以进行后续处理。(The performance of neural network (NN) may be subjected to the limitation of the number of performed operation.Using being related to shifting memory block selected displacement stride with the line buffer for the neuron that is used to cooperate, the data be operationally resident in memory and need multiple write cycles into cooperation line buffer can be handled as in single line buffer write cycle, to enhance the performance of NN/DNN.One or more instructions with the memory block shift value for communicating with line buffer can be generated in controller and/or iterator.The various characteristics of input data and NN/DNN including data dimension can be used to calculate in shift value.Line buffer can read data to be handled, the data of shift memory block, and write data into line buffer to carry out subsequent processing.)

1. a kind of system for reducing the power consumption in neural network environment, the system comprises:

At least one processor;

At least one line buffer, at least one described line buffer can be operated to execute and read and/or be written data;And

With at least one processor of at least one processor communication, computer is stored in at least one processor Readable instruction, the computer-readable instruction make at least one described processor when being executed by least one described processor:

One or more initiation parameters, the initiation parameter are received from the Cooperation controlling device assembly of the neural network environment Including represent will by the dimension of the data of the neural network environmental treatment data and represent one of the data or The data of one or more discontinuities of one or more data elements between multiple rows;

Data are loaded from the cooperation memory assembly of the neural network environment;

The displacement stride for representing bit number is calculated according to the initiation parameter, to shift the one or more of the data A data element;

One or more instructions are received from the Cooperation controlling device assembly of the neural network environment, by the data element It is displaced in the data through loading, to generate the displacement stride of displacement for being written at least one described line buffer Data;And

By one of the data transmission being written at least one described line buffer to the neural network environment or Multiple processing components are for handling.

2. system according to claim 1, wherein the application of the displacement stride leads at least one described line buffer In line buffer data monocycle processing.

3. system according to claim 1, wherein the computer-readable instruction also makes at least one described processor will The data transmission traversed by cooperation iterator is to the line buffer.

4. system according to claim 3, wherein the computer-readable instruction also makes at least one processor benefit Traverse the data with one or more sliding windows, the window operation is to select one or more data elements of book Element is as the one or more parts for being sent to one or more of processing components.

5. system according to claim 4, wherein the computer-readable instruction also makes at least one described processor The number through loading is traversed with one or more sliding windows across the data dimension boundary of the data through loading According to.

6. system according to claim 1, wherein the computer-readable instruction also makes at least one described processor will One or more data fillings are inserted into the data through loading.

7. system according to claim 1, wherein the computer-readable instruction also makes one or more added bit quilts One or more of processing unit processes, with generate output data for being written in at least one processor and Processed one or more of added bits are abandoned when executing output data write-in.

8. a kind of for reducing the computer implemented method of the power consumption in neural network environment, comprising:

One or more initiation parameters, the initiation parameter are received from the Cooperation controlling device assembly of the neural network environment Including represent will by the dimension of the data of the neural network environmental treatment data and represent the one or more of the data The data of one or more discontinuities of one or more data elements between row;

Data are loaded from the cooperation memory assembly of the neural network environment;

The number by the cooperation Iterator assembly of the neural network environment according to selected iterative operation iteration through loading According to;

The displacement stride for representing bit number is calculated according to the initiation parameter, to be inserted into one or more numbers of the data According to element;

According to one or more of initiation parameters, one is received from the Cooperation controlling device assembly of the neural network environment A or multiple instruction, and the displacement stride is applied in the data through loading, to generate directed line number of buffers According to and by the directed line buffer data write-in line buffer;And

By the data transmission being written into the line buffer to one or more processing components of the neural network environment For handling.

9. computer implemented method according to claim 8, wherein one or more portions of the data through loading Dividing is unequal part.

10. computer implemented method according to claim 8, wherein sliding window is operated to cross over the number of the data According to dimension boundary.

11. computer implemented method according to claim 8, further includes:

Filling son volume is inserted into the data through loading, the data through loading are by from the Cooperation controlling device assembly One or more of instructions for receiving and institute received one or more of initiation parameters define.

12. computer implemented method according to claim 8, further includes:

The data being written into the line buffer as described in one or more of processing unit processes, to generate output number According to.

13. computer implemented method according to claim 12, further includes:

The output data is handled by output Iterator assembly, to abandon insertions due to one or more displacement stride bits Processed added bit.

14. computer implemented method according to claim 8, further includes:

The line buffer for the directed line buffer data being written into is removed, to receive additional directed line number of buffers Accordingly for being written in the line buffer.

15. computer implemented method according to claim 8, further includes: the directed line buffer data to be written In the line of selected number in the line buffer, wherein every line of the line buffer and the neural network environment Collaborative process unit it is associated.

Background technique

In artificial neural network (NN), neuron is for the basic unit to the biological neural Meta Model in brain. The model of artificial neuron includes the inner product of input vector and weight vectors along with applying nonlinear deviation.For depth Neural network (DNN) (for example, as indicated by exemplary DNN module), can be mapped closely together artificial neuron for neuron Member.Operationally, when the neuron continuous processing data of DNN and neuron or one group of neuron are avoided the occurrence of in process cycle When the case where not handling data inside, DNN will play its optimum performance.

When handling the data across NN or DNN, the controller for needing to be implemented Example processing operations carries out mass data Iteration is so as to application-specific operation.Such requirement may will affect entire NN or DNN performance, so as to cause crucial time delay, from And desired designated treatment target is damaged (for example, mark object and/or plant characteristic in exemplary input data --- figure Picture, sound, geographical coordinate etc.).In general, some existing NN and DNN are in the various cooperation memory assembly (examples to NN/DNN Such as, line buffer) execute including memory read and write including various operations when can spend evitable processing time (example Such as, floating-point/fixed-point operation (GFlops/s) per second) and storage space (for example, byte number (GBytes/s) of transmission per second). Specifically, current practice does not identify input/data key feature and/or is not provided with to the cooperating components of NN or DNN Close the input data in NN the or DNN memory assembly (including line buffer) that how most preferably to manage/indicate cooperation Read/write operations to avoid such performance issue instruction.It is associated with the inefficient data processing in NN or DNN The reason in part for performance influences is the inefficient processing of the data among the neural processing component of NN or DNN.This inefficient data Management and processing need additional to be usually evitable calculating/neuron processor operation, this further affects entire NN/ The performance of DNN.

More favorable NN/DNN by Disposition Instructions collection, the instruction set indicate NN/DNN cooperation memory assembly, particularly Line buffer is with the displacement stride for the neuron that operationally memory block displacement each links up.Operationally, memory block Displacement can permit and extract the sliding windows for multiple overlappings/coherent from single memory block in single process cycle Data.In this way, the data in memory can be considered as neighbouring (contiguous) memory block for processing, to mention The overall performance of high NN/DNN.

Consider about these and other consider, proposes the disclosure being presented herein.

Summary of the invention

Technology described herein provides will be in Exemplary neural network (NN) and/or deep neural network (DNN) ring The virtualization of one or more hardware iterative device used in border, wherein the line buffer component to cooperate operationally allows to improve Overall performance and the data processing for optimizing memory management.In illustrative realization, exemplary DNN environment may include one Or multiple process blocks (for example, computer processing unit CPU), Memory Controller, line buffer, high bandwidth structure are (for example, originally Ground or external structure) (for example, transmitting data and/or data element between exemplary DNN module and the cooperating components of DNN environment The data/address bus of element), operation controller and DNN module.In illustrative realization, exemplary DNN module may include exemplary DNN state controller, descriptor List Controller (DLC), dMA (DDMA), DMA stream activation (DSA), operation controller, load Controller and repository controller.

In declarative operation, the operation controller of NN/DNN environment can operationally handle mass data, to answer With one or more expected data processing operations (for example, convolution, maximum pond, scalar multiplication/addition, summing, being fully connected Deng).In declarative operation, the user of participation can be by using operationally from cooperative operation controller/iterator reception The line buffer of one or more instructions of various operations (including but not limited to data processing and memory management operations) is executed, To specify the dimension of data being processed and relating to how to handle the data so that NN/DNN calculates the configuration that environment uses.

In illustrative realization, blob can be expressed as by the data of NN/DNN environmental treatment.In general, blob expression is deposited The data of iteration are needed in reservoir.Each blob can maintain by various dimensions (such as width, height, port number, interior nucleus number and Other available dimension units) the logical mappings shape that defines.In declarative operation, operation controller can traverse multidimensional blob The smaller N-dimensional of (for example, by logical data mapping definition) or such blob slice, wherein N is dimension (for example, for indicating The 3D blob, N=3 of image with width, height and port number) (for example, using one or more hardware or virtual iteration Device).The blob of traversal can use one or more instructions and be sent to cooperation line buffer to be traversed with managing in line buffer The read/write of data.The blob of data can be handled by the various assemblies of NN/DNN, including being capable of handling input data Iterator and the iterator for being capable of handling output data, these data can illustratively be from the one or more cooperations of NN/DNN Processing unit is exported.

Schematically, the data of memory block operationally can be shifted one or more collaborative process lists by line buffer The displacement stride of first (such as neuron).Line buffer can be configured to store data in the row (row) of predefined number/ In line (line) (for example, 64 rows, line), which can receive the data of the selected number in row/line (for example, can be with The data of 32 bytes are received for every row of line buffer).Line buffer can be operated to move according to displacement stride calculated Bit memory block number evidence, the displacement stride illustratively can based on input data characteristic (for example, displacement stride, continuity, Highly, width, kernel etc.) (for example, if the value of stride is 0, the line of line buffer can have identical data, and such as The value of fruit stride is 1, then data can be shifted 1 to allow the more effective read/write operations in line buffer).It can grasp Make ground, line buffer can read additional data from cooperation memory block in memory is operationally resident, with allow from The memory that cooperates carries out single reading, so that the data of memory block are in neighbouring memory block.

Illustratively, the data that line buffer is written can be (such as one or more by one or more collaborative process units A neuron) it operationally handles to generate output data.Output data can also include one or more data elements, the number According to element represent by output data generated be written cooperation memory assembly when can be dropped, example shift walk The data element that width is inserted into during operating.

It will be appreciated that though be described about system, but above-mentioned subject content also may be implemented as computer The products such as device, computer procedures, computing system or the computer-readable medium of control and/or special chip group.Pass through It reads described further below and examines associated attached drawing, these and various other features will become apparent.This " hair is provided For bright content " to introduce some concepts in simplified form, these concepts will be further in following " specific embodiment " Description.

This " summary of the invention " is neither intended to the key features or essential features for identifying subject content claimed, also not It is intended for limiting the range of subject content claimed.In addition, subject content claimed is not limited to solve The realization for any or all disadvantage pointed out in any part of the disclosure.

Detailed description of the invention

" specific embodiment " is described with reference to the drawings.In the accompanying drawings, (multiple) of appended drawing reference leftmost number mark The attached drawing that the appended drawing reference first appears.Same reference numerals in different attached drawings indicate similar or identical project.To multiple The appended drawing reference with the letter in alphabetical sequence can be used to quote each independent item in the reference of independent project in project Mesh.Certain figures label without alphabetical sequence can be used to the general reference of project.

Fig. 1 illustrates the block diagrams according to the Exemplary neural network computing environments of system and method described herein.

Fig. 2 illustrates the block diagram of the Exemplary neural network environment using oriented line buffer.

Fig. 3 illustrates the example indicated in the mapping of illustrative logical data according to system and method described herein The block diagram of property input data.

Fig. 4 illustrates the block diagram of the exemplary input data indicated in the mapping of illustrative logical data, and it illustrates make With the illustrative n sliding window that can operate one or more line to map across illustrative logical data.

Fig. 5 illustrates the example indicated in the mapping of illustrative logical data according to system and method described herein Property input data block diagram, it illustrates use to operate saying with one or more line across the mapping of illustrative logical data N sliding window of bright property, the mapping of these illustrative logical data can be operated to allow data filling to enhance as processing.

Fig. 6 illustrates the example indicated in the mapping of illustrative logical data according to system and method described herein Property input data block diagram, it illustrates use displacement stride to allow to carry out neighbouring memory reading in oriented line buffer Take/write operation.

Fig. 6 A illustrates the block diagram of exemplary output data indicated in the mapping of illustrative logical data, it illustrates The discarding for the displacement stride data bit being inserted into during exemplary output write operation.

Fig. 7 is according to system and method described herein for the Exemplary neural network using oriented line buffer The flow chart of the example process of data processing in environment.

Fig. 8 shows the additional of the illustrative computer architecture for being directed to the computer for being able to carry out method described herein Details.

Fig. 9 shows the additional thin of the illustrative calculating equipment to cooperate according to system and method described herein Section.

Specific embodiment

The following "Specific Embodiments" describe will be in Exemplary neural network (NN) and/or deep neural network (DNN) The virtualization technology of one or more hardware iterative device used in environment, wherein the line buffer component to cooperate is operationally permitted Perhaps it improves overall performance and optimizes the data processing of memory management.In illustrative realization, exemplary DNN environment be can wrap Include one or more process blocks (for example, computer processing unit CPU), Memory Controller, line buffer, high bandwidth structure (for example, local or external structure) (for example, between exemplary DNN module and the cooperating components of DNN environment transmit data and/ Or the data/address bus of data element), operation controller and DNN module.In illustrative realization, exemplary DNN module be can wrap Include exemplary DNN state controller, descriptor List Controller (DLC), dMA (DDMA), DMA stream activation (DSA), operation control Device, load controller and repository controller.

It should be appreciated that described subject content may be implemented as the device of computer control, computer procedures, calculating The products such as system or computer readable storage medium.Other than a lot of other benefits, technology herein improves pass In the efficiency of various computing resources.For example, determining that displacement stride can be reduced executes such as face recognition, Object identifying, image Calculating cycle number required for many complex tasks such as generation.

In addition, improved interpersonal interaction may be implemented by the more acurrate and faster completion for introducing this task.Separately Outside, network service can be reduced using displacement stride, reduces power consumption and memory uses.It can also be from technology disclosed herein Realization in realize other technologies effect other than those mentioned herein.

In declarative operation, the operation controller of NN/DNN environment can operationally handle mass data to apply One or more expected data processing operations (for example, convolution, maximum pond, scalar multiplication/addition, sum, be fully connected). In declarative operation, the user of participation can be executed by using operationally receiving from cooperative operation controller/iterator The line buffer of one or more instructions of various operations (including but not limited to data processing and memory management operations), to refer to The dimension of fixed data being processed and the configuration used relating to how to handle the data for NN/DNN calculating environment.

In illustrative realization, blob can be expressed as by the data of NN/DNN environmental treatment.In general, blob expression is deposited The data of iteration are needed in reservoir.Each blob can maintain by various dimensions (such as width, height, port number, interior nucleus number and Other available dimension units) the logical mappings shape that defines.In declarative operation, operation controller can traverse multidimensional blob The smaller N-dimensional of (for example, by logical data mapping definition) or such blob slice, wherein N is dimension (for example, for indicating The 3D blob, N=3 of image with width, height and port number) (for example, using one or more hardware or virtual iteration Device).The blob of traversal can use one or more instructions and be sent to cooperation line buffer to be traversed with managing in line buffer The read/write of data.The blob of data can be handled by the various assemblies of NN/DNN, including being capable of handling input data Iterator and the iterator for being capable of handling output data, these data can illustratively be from the one or more cooperations of NN/DNN Processing unit is exported.

Schematically, can operationally can be considered as the displacement of the data of memory block can be any for line buffer The displacement stride of the displacement of one or more values in the memory block in line buffer is written in position.It can be by line buffer structure It makes to store data in row/line of predefined number (for example, 64 rows, line), which can receive the institute in row/line The data (for example, data that 32 bytes can be received for every row of line buffer) of the number selected.Line buffer can operate To carry out shift memory block number evidence according to displacement stride, which illustratively can be based on the characteristic (example of input data Such as, stride, continuity, height, width, kernel etc. are shifted) (for example, if the value of stride is 0, the line of line buffer can be with Data having the same, and if the value of stride is 1, it is more effective in line buffer to allow that data can be shifted 1 Read/write operations).Operationally, line buffer can be read from the cooperation memory block being operationally resident in memory Additional data is taken, to allow to carry out single reading from cooperation memory, so that the data of memory block are neighbouring In memory block.

Illustratively, the data that line buffer is written can be (such as one or more by one or more collaborative process units A neuron) it operationally handles to generate output data.Output data can also include one or more data elements, the number According to element represent by output data generated be written cooperation memory assembly when can be dropped, example shift walk The data element that width is inserted into during operating.

Neural Networks of Background:

In artificial neural network, neuron is for the basic unit to the biological neural Meta Model in brain.Manually The model of neuron may include the inner product of input vector and weight vectors along with applying nonlinear deviation.Compare and Speech, in exemplary DNN module, neuron (for example, 105 of Fig. 1) is closely mapped to artificial neuron.

Illustratively, DNN module can be considered as superscalar processor.Operationally, it can refer to one or more Enable the multiple execution units for being dispatched to referred to as neuron.Execution unit can be " while scheduling is completed at the same time ", wherein each holding Row unit is synchronous with every other execution unit.DNN module can be classified as SIMD (single instruction stream, multiple data stream) framework.

The exemplary DNN environment 100 of Fig. 1 is turned to, DNN module 105, which has, has unique L1 and L2 cache structure Memory sub-system.These not traditional caches are designed exclusively for nerve processing.It rises for convenience See, these cache structures use the title for reflecting its expected purpose.By example, L2 cache 150 can illustrate Property with the high-speed dedicated interface that is operated under selected frequency (for example, 16 gigabit per second (16GBps)) to maintain The memory capacity (for example, one Mbytes (1MB)) of selection.L1 cache can maintain selected memory capacity (for example, can Eight kilobytes (8KB) to be distributed between kernel data and activation data).L1 cache is properly termed as line buffer, and L2 cache is properly termed as BaSRAM.

DNN module can be the neural network only recalled, and programmatically support multiple network structure.It can take Off-line execution network training in business device field or data center.It is trained the result is that being properly termed as one group of parameter of weight or kernel. These parameters indicate to can be applied to the transforming function transformation function of input, as a result, classification or the output of semantic marker.

In declarative operation, DNN module can receive panel data as input.Input is not limited only to image data, As long as the data presented are unified planar formats, DNN can be operated on it.

The layer descriptor list corresponding with the layer of neural network of DNN module pair operates.Illustratively, DNN module Layer descriptor list can be considered as instruction.These descriptors can be from being pre-fetched into DNN module, and by suitable in memory Sequence executes.

In general, can there are two types of the layer descriptors of main class: 1) the mobile descriptor of memory to memory and 2) operating Descriptor.The mobile descriptor of memory to memory can be used for by data from main memory move to local cache/from this Ground cache moves to main memory so that operation descriptor uses.The execution that the mobile descriptor of memory to memory is followed Assembly line is different from operation descriptor.The target pipeline of the mobile descriptor of memory to memory can be internal DMA engine, And the target pipeline for operating descriptor can be neuron processing element.Operation descriptor is able to carry out many different layer behaviour Make.

The output of DNN is also the blob of data.Output streaming optionally can be transferred to local cache or streaming It is transferred to main memory.DNN module can extract data in the range of software allows as early as possible.Software can be by using description Isolation and setting dependence between symbol prefetch to control.Before meeting dependence, it will prevent that there is dependence collection Descriptor advance.

Turning now to Fig. 1, Exemplary neural network environment 100 may include various cooperating components, including DNN module 105, Cache memory 125 or 150, low bandwidth infrastructure 110, bridge joint device assembly 115, high bandwidth structure 120, SOC 130, PCIE " endpoint " 135, Tai Silida (Tensilica) node 140, Memory Controller 145, LPDDR4 memory 155 and input number According to source 102.In addition, as shown, DNN module 105 can also include multiple components, including prefetch 105 (A), DMA 105 (B), Register interface 105 (D), load/store unit 105 (C), layer controller 105 (D), preservation/recovery component 105 (E) and nerve 105 (F) of member.Operationally, exemplary DNN environment 100 can handle data according to selected specification, wherein DNN module Execute one or more functions described herein.

Fig. 2, which is illustrated, can operate using the Exemplary neural of a part using oriented line buffer 220 as data processing Network environment 200.As shown, Exemplary neural network environment 200 (also referred herein as calculating equipment or calculating equipment ring Border) it include being cooperated with line buffer 220 to provide one or more operation controls of one or more instructions for data processing Device 235 processed.Line buffer 220 can be operated to pass through the external memory component of external structure 230 and structure 215 from cooperation 225 receive data, and operation to receive from (multiple) iterator 240 (for example, iterator based on hardware and/or virtualization) One or more instructions/commands are (for example, reading the instructions/commands of data from cooperation memory assembly and/or will store from cooperation The instruction of the data write-in line buffer of device assembly load).Operationally, line buffer 220 can be according to from one or more Controller 235 (also referred herein as " Cooperation controlling device assembly 235 ") received one or more instructions are operated according to selected by Stride width carry out shifted data.In addition, line buffer 220 can be with (multiple) processing unit (for example, (multiple) neuron) Cooperation is to provide the bit shift data of write-in to be further processed directly or indirectly through structure 215.Neural network Environmental structure can be can be by the data/address bus of various data.Oriented line buffer is considered one kind being capable of basis Received one or more instructions read and write the memory assemblies of data and/or data element.

In declarative operation, Exemplary neural network environment 200 can operationally be handled according to the process described in Fig. 7 Data.Specific to component described in Fig. 2, these components are merely illustrative, those skilled in the art will be understood that Fig. 6 and Processing described in Fig. 7 can also be executed by the other assemblies other than component shown in Fig. 2.

In addition, as shown in Fig. 2, Exemplary neural network environment can optionally include can illustratively operate with iteration Input data (not shown) with for being handled by one or more neuron processors 205 one or more iterators (for example, Iterator based on hardware and/or virtualization) (shown in dotted line).It will be appreciated by those skilled in the art that illustrative one Or multiple iterators is this optional including being merely illustrative, because of the described invention of system and method disclosed herein Concept can operate in Exemplary neural network environment 200, operate in the case where being not necessarily to any iterator.

Fig. 3, which is illustrated, maps 300 for the illustrative logical data of exemplary input data.As shown, data 305 can To be expressed as the data (for example, the data dimension considered on the whole is allowed to define book) with certain dimension 340, packet Include channel counts 310, height 315 and width 320.Data 305 can be distributed and be prepared to the system and method being described herein For being handled by the n neuron 330 to cooperate, allow to first part a being transmitted to peripheral sensory neuron, by second part B is transmitted to nervus opticus member, and so on, until n part is sent to n neuron.

In declarative operation, each sections of data 305 can based on by Exemplary neural network environment (for example, Fig. 2 200) one or more instructions that Cooperation controlling device assembly provides are determined using n sliding window/kernel 325.In addition, such as Shown in figure, input data part a, b, c and d be can be used by the cooperation of Exemplary neural network environment (for example, 200 of Fig. 2) One or more initiation parameters that operation controller assemblies (235) provides are addressed to physical storage 325.

Fig. 4 shows the example logic datagram 400 of exemplary input data (not shown).Example logic datagram 400 include First Line 410 (being shown with diagonal wire tag) and the second line 420 (shown in dotted line).Each ground figure line may include Multiple sliding windows are (for example, 430,440 and 450 for First Line 410 and 460,470 and for the second line 420 480).In addition, as shown, logical data Figure 40 0 shows the data dimension boundary (example that sliding window crosses over input data Such as, across First Line 410 and the second line 420) ability.This ability allows to improve performance, because can pass through the mind of cooperation More effectively prepare more data for subsequent processing through network processing components (for example, 205 of Fig. 2).

Fig. 5 is similar to Fig. 4, and is presented with description system and method described herein and allows using filling come into one The ability of the performance characteristics of step enhancing Exemplary neural network environment (for example, 200 of 100 and Fig. 2 of Fig. 1).As shown, patrolling Collecting datagram 500 (unshowned exemplary input data) may include across one or more lines (for example, 510 and 520) Various sliding windows (530,540,550,560,570 and 580).In addition, logical data Figure 50 0 can also include filling 580.

It, can be in the operation of Exemplary neural network environment (the 200 of 100 or Fig. 2 of Fig. 1) in declarative operation Dynamic addition filling 580.The operation controller 235 of Fig. 2 may specify in the shown in Fig. 3 of input data (for example, blob) The loading (for example, the summation of dimension is allowed to be considered as book) used in each dimension 340, and neural network ring Border (for example, iterator controller instructs) can operationally construct book, just look like to be filled in be physically present at storage In device.It can also be defeated in the iterator for being added to filling by exemplary neural network environment (for example, iterator controller instructs) Out position generates default value.

Fig. 6 is the block diagram of exemplary line buffer data 600.As shown in fig. 6, exemplary line buffer input data 600 It may include the borderless logic mapping 605 of line buffer data 600.Logical mappings may include height and width and data element Plain (605 (1), 605 (2), 605 (3), 605 (4), 605 (5), 605 (6), 605 (7), 605 (8), 605 (9), 605 (10), 605 (11)、605(12)、605(13)、605(14)、605(15)、605(16)、605(17)、605(18)、605(19)、605(20)、 605(21)、605(22)、605(23)、605(24)、605(25)、605(26)

Deng).Example data element can store in the row 610,615 and 620 in logical mappings, and can be used N sliding window is iterated.Line buffer data 600 can also be expressed as the dismission logical mappings with consecutive data block 625, which has individual data segment 630 and 635.Individual data segment can indicate that cross-line buffer inputs One or more rows (row) of data/row (line) storage data volume.In addition, as shown in fig. 6,630 He of data segment dismissed Each of 635 may include the two row one or more positions 640 that line buffer input block crosses over input data.It can grasp Make ground, not neighbouring deposit can be written come shift memory block number according to selected stride width in exemplary line buffer accordingly Reservoir block is as neighbouring memory block.In illustrative realization, line buffer data 600 may include storing from collaboration data The data fetched in device and/or cooperation Iterator assembly.

As an example, exemplary line buffer can receive a storage when loading the data for handling convolutional layer Device block (for example, 32 byte datas), and the displacement stride by the way that block to be moved to each continuous neuron, can be in signal period It is middle that a part of the data block is distributed into multiple neurons.In this way, can be extracted from single block in signal period multiple Overlapping/continuously slipping window data.When sliding window jumps from a line to another line to be inputted, and in kernel width In the case where 1, discontinuity point will be present in the data from the last window of lastrow to the first window of next line.Cause This, even if the data of these windows are located in the same memory block in illustrative local storage, it is also possible to line be needed to buffer Device execution is written twice to solve this discontinuity.

In addition, as shown in fig. 6, exemplary line number can be physically stored in illustrative memory block according to 600.As schemed Show, scene 655 and 650 is written according to two example memory blocks, is illustratively shown example memory block.It is illustrative Exemplary row cache data 670A in first data write-in scene 655 may include indicating from multiple storage locations The data of (for example, 655 (1), 650 (2), 655 (3), 655 (4), 655 (5), 655 (6), 655 (7), 655 (8) and 655 (9)) The data of the period 1 write-in of 655A.Similarly, the exemplary row buffering in illustrative second data write-in scene 655 is deposited Memory data 670A may include indicating from multiple storage locations (for example, 650 (10), 650 (11), 650 (12), 650 (13), 650 (14), 650 (15), 650 (16)) data 650A second round write-in data.

In addition, as shown in fig. 6, shifting stride operation 660 according to exemplary line buffer, exemplary line buffer data can Monocycle write operation is written from the data being stored in memory block 680.As shown, memory block 680 may include Multiple storage locations are (for example, 680 (1), 680 (2), 680 (3), 680 (4), 680 (5), 680 (6), 680 (7), 680 (8), 680 (9), 680 (10), 680 (11), 680 (12), 680 (13), 680 (14), 680 (16) etc.).In illustrative realization, such as Fig. 6 institute Show, stride operation 660 can be shifted according to line buffer to store the exemplary line buffer data 675A of line buffer 675. In displacement stride shifting function 660, the displacement stride that computation goes out carrys out the data of 680 data of shift memory block to be written Line buffer data 675A.Line buffer can be written into as a part of line buffer 675 in additional shift stride position 645 675, thus allow to carry out 680 data of memory block monocycle processing 680A, rather than as 650 He of scene is written in memory block Such two periods are written in 655 memory block.

Illustratively, 680A operation is handled according to the monocycle of memory block 680, handles 655A and 650A phase with binary cycle Than list write operation as shown in FIG. 6 can generate additional row in line buffer.Operationally, as illustrative in Fig. 6 A Ground description, NN can handle the additional row of data in line buffer, one as the NN data processing operation such as convolution Point, it, then can be in the output number generated by one or more collaborative process units such as neuron to generate output data According to operation when save operation during abandoned.

It will be appreciated that though the example memory block number evidence of Fig. 6 is shown as the displacement stride that applicable value is 1 to allow Memory is considered as with neighbouring memory block, but the displacement stride is only descriptive, and can have realization and show Any value required for the storage result of various data processing operations expected from example nerve network environment.

Fig. 6 A is shown with the illustrative output for generating the output data that scene I and II are indicated according to two output datas The example logic data of data environment 680 map.As shown in Figure 6A, many neurons 682 can handle data element and (not show Out) to generate corresponding output data, such as 684 (M), 686 (M), 688 (M), 690 (M) and 692 (M), to be stored in such as Exemplary line buffer etc. cooperates in memory assembly 682 (M).According to exemplary output data generate scene I, neuron 684, 686,688 and 690 data element (not shown) can be handled operationally to generate corresponding output data 684 (M), 686 (M), 688 (M) and 690 (6M).Scene II is generated according to exemplary output data, neuron 684,686,688,690 and 692 can Corresponding output data 684 (M), 686 (M), 688 (M), 690 (M) are generated operationally to handle data element (not shown) With 692 (M).As according to exemplary auxiliary export generate scene II shown in, can the shade mapped by example logic data/ During the output data write operation of fringe area instruction, operationally abandoning can be shown by what Exemplary neural member 692 generated Example property output data element 692 (M).

In illustrative realization, exemplary output data, which generates scene II, can indicate displacement described in deployment such as Fig. 6 Stride operation Exemplary neural network environment data processing so that by displacement stride operate expression additional data elements by For 682 processing of Exemplary neural member to generate extra output data, which can be by output data example Property be stored in cooperation memory assembly in during be dropped.

Fig. 7 is the illustrative process 700 for minimizing the memory in NN/DNN environment using oriented line buffer and reading Flow chart.As shown, processing starts from frame 705, in frame 705, from the cooperating components of neural computing environment (for example, Operate controller) one or more initiation parameters are received, wherein one or more initiation parameters may include indicating input The data of data block discontinuity calculated between the data of the dimension of data and the row of expression input data.Then, Processing proceeds to frame 710, in frame 710, calculates the displacement stride that can be used for shifting the data fetched.It is illustrative Ground, displacement stride can be used one or more initiation parameters and generate one or more oriented line buffer write instructions (LBWI) it calculates.

Processing then proceedes to frame 715, in frame 715, can from the cooperation memory repository of neural network environment and/or Data are fetched in cooperation Iterator assembly.Then, it in frame 720, writes data into associated with one or more processing units In one or more rows of line buffer.Number can be written according to oriented line buffer write instruction (LBWI) generated According to.LBWI may include according to frame 705 it is received cause in line buffer input data carry out monocycle processing Initiation parameter is instructed to write data into the one or more shifted in the line buffer of stride displacement.

Then, processing proceeds to frame 725, in frame 725, transfer data to one or more collaborative process units (for example, Neuron) to carry out follow-up data processing.Then, processed data may be used as neural network environment and/or Collaboration computing The input of one or more cooperating components of environment.Such output can be shown for user's interaction of participation.In addition, in frame 725, when being written from one or more collaborative process units to other cooperating components of neural network environment, can abandon It is written in line buffer and by the additional displacement stride block of one or more collaborative process cell processings.

Then check to determine whether that there are other input datas to be processed (that is, as iterative operation in the execution of frame 735 A part).If not having additional input data, processing terminates at frame 740.But if additional input data need Iterative operation then handles then return frame 705 and continues therefrom.

Computer architecture 800 shown in Fig. 8 include central processing unit 802 (" CPU "), system storage 804 (including with Machine access memory 806 (" RAM ") and read-only memory (" ROM ") 808) and memory 804 is coupled to CPU's 802 System bus 810.Comprising being used to help such as transmit information between each element in computing architecture 800 during startup The basic input/output of basic routine be stored in ROM 808.Computing architecture 800 further includes for storing operation system The mass-memory unit 812 of system 814, other data and one or more application program.

Mass-memory unit 812 is connected to CPU by being connected to the bulk memory controller (not shown) of bus 810 802.Mass-memory unit 812 and associated computer-readable medium are that computing architecture 800 provides non-volatile memories.To the greatest extent The description for managing the computer-readable medium for including herein refers to mass-memory unit, such as solid state drive, hard disk or CD- ROM drive, it should be appreciated to those skilled in the art that computer-readable medium medium can be computing architecture 800 can be with Any available computer storage medium or communication media of access.

Communication media includes computer-readable instruction, number in the modulated data signals such as carrier wave or other transmission mechanisms According to structure, program module or other data, and including any transfer medium.Term " modulated data signal ", which refers to, to be had with energy Encode information onto mode in the signal enough to change or be arranged the signal of one or more characteristic.As example rather than limit System, communication media include the wired mediums such as cable network or direct wired connection and such as acoustics, RF, it is infrared and its The wireless mediums such as his wireless medium.The combination of any of the above content should also be as being included within the scope of computer readable media.

By example rather than limit, computer storage medium may include for storage such as computer-readable instruction, It is volatile and non-volatile that any methods or techniques of the information such as data structure, program module or other data is realized, removable It removes and nonremovable medium.For example, computer media includes but is not limited to RAM, ROM, EPROM, EEPROM, flash memory or other are solid State memory technology, CD-ROM, digital versatile disc (" DVD "), HD-DVD, BLU-RAY or other optical storages, tape Box, tape, disk storage device or other magnetic storage devices or it can be used for storing desired information and can be by calculating Any other medium that rack structure 800 accesses.For the purpose of claim, phrase " computer storage medium ", " computer can Reading storage medium " and its variant do not include wave, signal and/or other transient states and/or intangible communication media itself.

According to various technologies, computer architecture 800 can be used to be arrived by network 820 and/or another network (not shown) The logical connection of remote computer 805 operates in networked environment.Computer architecture 800 can be by being connected to bus 810 Network Interface Unit 816 be connected to network 820.It should be appreciated that Network Interface Unit 816 can be used for being connected to other classes The network and remote computer system of type.Computer architecture 800 can also include setting for receiving and handling from multiple other The input/output control of the input of standby (including keyboard, physical sensors 825, mouse or electronic stylus (being not shown in Fig. 8)) Device 818.Similarly, i/o controller 818 can be to display screen, printer or other kinds of output equipment (in Fig. 8 In be also not shown) provide output.It is also understood that via the connection for arriving network 820 by Network Interface Unit 816, calculating support Structure can enable DNN module 105 to communicate with environment 100 is calculated.

It should be appreciated that component software described herein can be in being loaded into CPU 802 and/or DNN module 105 simultaneously And be performed, CPU 802 and/or DNN module 105 and entire computer 800 are transformed to be customized from general-purpose computing system To promote functional special-purpose computing system presented herein.CPU 802 and/or DNN module 105 can by can individually or Any number of transistor or other discrete circuit elements and/or chipset that any number of state is jointly presented are constituted. More specifically, in response to the executable instruction for being included in software module disclosed herein, CPU 802 and/or DNN module 105 can be used as finite state machine operation.These computer executable instructions can by specified CPU 802 how state it Between transformation to be converted to CPU 802, thus to constitute CPU 802 transistor or other discrete hardware elements become It changes.

Computer-readable medium presented herein can also be converted by encoding to software module presented herein Physical structure.In the different realizations of this specification, the particular transform of physical structure can depend on various factors.It is such because Based on the example of element can include but is not limited to the technology for realizing computer-readable medium, computer-readable medium is characterized Storage device or auxilary unit etc..For example, if computer-readable medium is implemented as the memory based on semiconductor, Then can by convert semiconductor memory physical state by Software Coding disclosed herein in computer-readable medium On.For example, the software can convert the shape of the transistor for constituting semiconductor memory, capacitor or other discrete circuit elements State.The software can also convert the physical state of these components, so as to storing data on it.

As another example, magnetical or optical technology can be used to realize in computer-readable medium disclosed herein. In such an implementation, when software is coded in wherein, software presented herein can convert magnetical or optical medium Physical state.These transformation may include the magnetic properties for the specific position changed in given magnetic medium.These transformation may be used also To include the physical features or characteristic of the specific position changed in given optical medium, to change the optical characteristics of these positions. Without departing substantially from the scope and spirit of this specification, other transformation of physical medium be it is possible, aforementioned exemplary is provided Merely to promoting the discussion.

In view of the foregoing, it should be understood that the physical conversion of many types has occurred in computer architecture 800, so as to Store and execute component software presented herein.It is also understood that computer architecture 800 may include other kinds of calculating Equipment, including handheld computer, embedded computer system, personal digital assistant and it is well known by persons skilled in the art other The calculating equipment of type.It is also conceivable that computer architecture 800 can not include all components shown in Fig. 8, it may include figure The other assemblies being not explicitly shown in 8, or the framework entirely different with framework shown in Fig. 8 can be used.

Computing system 800 as described above can be deployed as a part of computer network.In general, to environment is calculated Foregoing description is suitable for disposing both server computer and client computer in a network environment.

Fig. 9 illustrates the exemplary illustrative networked computing environment that can wherein use device and method described herein 900, wherein server is communicated via communication network with client computer.As shown in figure 9, (multiple) server 905 can be through By communication network 820, (it can be fixed wired or Wireless LAN, WAN, Intranet, extranet, peer-to-peer network, Virtual Private Network One in network, internet, bluetooth communication network, proprietary low-voltage communication network or other communication networks or combination) and it is more A client calculates environment (such as tablet personal computer 910, mobile phone 915, phone 920, (multiple) personal computer 801, personal digital assistant 925, smart phone wrist-watch/individual goal tracker are (for example, Apple wrist-watch, Samsung, FitBit Deng) 930 and smart phone 935) interconnection.In the network environment that communication network 820 is internet, for example, (multiple) server 905 can be dedicated computing environment server, which can operate to locate via any one of a variety of known protocols It manages data and calculates the transmission data of environment 801,910,915,920,925,930 and 935 to from client, it is known that agreement is all Such as hypertext transfer protocol (HTTP), File Transfer Protocol (FTP), Simple Object Access Protocol (SOAP) or Wireless Application Protocol (WAP).In addition, network computing environment 900 can use various data security protocols, such as security socket layer (SSL) or good Good privacy (PGP).Each client calculates environment 801,910,915,920,925,930 and 935 can be equipped with operating system 814, which can operate to support one or more calculating applications or terminal session, such as web browser (not to show Out) or other graphic user interface (not shown)s or mobile desktop environment (not shown), to obtain to (multiple) server Calculate the access of environment 905.

(multiple) server 905 can be communicatively coupled to other and calculate environment (not shown), and receive related ginseng With interaction/resource network data of user.In declarative operation, user's (not shown) can in (multiple) client meter The calculating application interaction environmentally run is calculated, to obtain expected data and/or calculate application.Data and/or calculate application can be with It is stored on (multiple) server computing en 905, and environment is calculated by client in exemplary communication network 820 801,910,915,920,925,930 and 935 it is sent to collaboration user.Participating user's (not shown) can request access to whole Or it is partially received in specific data and application on (multiple) server computing en 905.These data can be in client meter It calculates and is transmitted for locating between environment 801,910,915,920,925,930,935 and (multiple) server computing en 905 Reason and storage.(multiple) server computing en 905 can be used for generation, certification, encryption and the communication of data and application with trustship Calculating application, process and small routine, and can be with other server computing en (not shown), third party's service supplier (not shown), network additive storage device (" NAS ") and storage area network (" SAN ") cooperation, to realize that application/data are handed over Easily.

Example clause

Disclosure presented herein can be considered in view of following clause.

Example clause A, a kind of system of the data processing for enhancing, the system include: at least one processor, can grasp Make with execute with read and/or be written at least one line buffer of data and at least one processor communication at least One memory is stored with computer-readable instruction at least one processor, and computer-readable instruction is by least one Reason device makes at least one processor when executing: receiving one or more initialization from the Cooperation controlling device assembly of neural network environment Parameter, initiation parameter include represent will by the dimension of the data of neural network environmental treatment data and represent one of data Or the data of one or more discontinuities of one or more data elements between multiple rows, from the association of neural network environment Make memory assembly load data, the displacement stride for representing bit number is calculated according to initiation parameter, to shifted data One or more data elements receive one or more instructions from the Cooperation controlling device assembly of neural network environment, by data Element shift is into the data through loading, to generate the number of the displacement stride displacement for being written at least one line buffer According to, and by one or more processing groups of the data transmission being written at least one line buffer to neural network environment Part is for handling.

Example clause B, the system of example clause A, wherein the application of displacement stride causes at least one line buffer The monocycle of line buffer data is handled.

Example clause C, the system of example clause A and B, wherein computer-readable instruction also makes at least one processor will be by The data transmission for the iterator traversal that cooperates is to line buffer.

Example clause D, the system of example clause A to C, wherein computer-readable instruction also utilizes at least one processor One or more sliding windows carry out ergodic data, and window can be operated to select one or more data elements of book as quilt It is transmitted to one or more parts of one or more processing components.

Example clause E, the system of example clause A to D, wherein computer-readable instruction also uses at least one processor The data through loading are traversed across one or more sliding windows on the data dimension boundary of the data through loading.

The system of example clause F, example clause A into E, wherein computer-readable instruction also makes at least one processor will One or more data fillings are inserted into the data through loading.

Example clause G, the system of example clause A to F, wherein computer-readable instruction also makes one or more added bits It is to generate for the output data at least one processor to be written and defeated executing by one or more processing unit processes Processed one or more added bits are abandoned when data are written out.

Example clause H, a method of computer implementation, comprising: connect from the Cooperation controlling device assembly of neural network environment One or more initiation parameters are received, initiation parameter includes representative will be by the number of the dimension of the data of neural network environmental treatment According to and represent one or more data elements between one or more rows of data one or more discontinuities data; Data are loaded from the cooperation memory assembly of neural network environment;As neural network environment cooperation Iterator assembly according to selected by Data of the iterative operation iteration through loading selected;The displacement stride for representing bit number is calculated according to initiation parameter, to be inserted into One or more data elements of data;One or more instructions are received from the Cooperation controlling device assembly of neural network environment;Root According to one or more initiation parameters, displacement stride is applied in the data through loading, to generate directed line buffer data And directed line buffer data is written in line buffer;And by the data transmission being written into line buffer to nerve One or more processing components of network environment are for handling.

Example clause I, the computer implemented method of example clause H, wherein one or more portions of the data through loading Dividing is unequal part.

Example clause J, the computer implemented method of example clause H and I, wherein sliding window can be operated to cross over data Data dimension boundary.

The computer implemented method of example clause K, example clause H to J, further includes: filling son volume is inserted into through adding In the data of load, the data through loading are by the one or more instructions received from Cooperation controlling device assembly and institute received one A or multiple initiation parameters define.

The computer implemented method of example clause L, example clause H to K, further includes: by one or more processing units The data being written into processing line buffers, to generate output data.

The computer implemented method of example clause M, example clause H to L, further includes: handled by output Iterator assembly Output data, to abandon due to the application calculated for shifting stride and processed added bit.

The computer implemented method of example clause N, example clause H to M, further includes: it is slow to remove the directed line being written into The line buffer of device data is rushed, adds oriented line buffer data for being written in line buffer to receive.

The computer implemented method of example clause O, example clause H to N, further includes: write directed line buffer data In the line for entering the selected number in line buffer, the wherein collaborative process of every line of line buffer and neural network environment Unit is associated.

Example clause P, a kind of computer readable storage medium being stored thereon with computer executable instructions, computer can Executing instruction makes the one or more processors for calculating equipment in the one or more processors execution by calculating equipment: from mind Cooperation controlling device assembly through network environment receives one or more initiation parameters, and initiation parameter includes representative will be by nerve One or more data elements between the data of the dimension of the data of network environment processing and the one or more rows for representing data The data of one or more discontinuities of element;Data are loaded from the cooperation memory assembly of neural network environment;By nerve net The cooperation Iterator assembly of network environment is according to selected data of the iterative operation iteration through loading;According to initiation parameter, meter Calculate the displacement stride for representing the bit number of one or more data elements for shifted data;From the cooperation of neural network environment Controller assemblies receive one or more instructions, and one or more bits are inserted into the data through loading, and have to generate To line buffer data and by one or more line of directed line buffer data write-in line buffer, wherein line buffer One or more line it is associated with one or more processing components of neural network environment;And by one of line buffer or The data transmission being written into a plurality of line is to the one of neural network environment associated with one or more line of line buffer A or multiple processing components are for handling.

Example clause Q, the computer readable storage medium of example clause P, wherein instruct also make to calculate one of equipment or Additional data is rolled up data of the insertion through loading by multiple processors.

Example clause R, the computer readable storage medium of example clause P and Q also cause to calculate the one of equipment wherein instructing A or multiple processors: by data that one or more processing unit processes are written into generate output data.

Example clause S, the computer readable storage medium of example clause P to R, wherein instructing one for also making to calculate equipment Or multiple processors: being abandoned one or more bits of output data by output iterator, and the bit expression being dropped is being applied The one or more bits being inserted into when shifting stride.

Example clause T, the computer readable storage medium of example clause P to S, wherein instructing one for also making to calculate equipment Or multiple processors: traversing the data through loading using the logical data mapping of the data through loading, the data through loading Traversal include by one or more sliding-window operations in logical data map with by the part through the data loaded with one or Multiple physical memory address are associated.

Example clause U, the computer-readable medium of example clause P to T, wherein memory assembly and physical sensors are assisted Make, physical sensors can generate the input number including audio data, video data, tactile sensation data and other data Accordingly for then by one or more collaborative process cell processings.

Example clause V, the computer-readable medium of example clause P to U, wherein collaborative process unit and one or more are defeated Physical assemblies electronically cooperate out, one or more output physical assemblies can operate with receive include audio data, video data, Processed input data including haptic data and other data is to be used for human interaction.

Example clause W, the computer-readable medium of example clause P to V, further includes: moved first according to calculated first Then position bit value shifts the data through loading according to another shifted bits value to shift the data through loading.

Conclusion

In short, although being answered with the various technologies of the language description specific to structural features and or methods of action Work as understanding, the subject content limited in appended expression is not necessarily limited to described special characteristic or movement.But special characteristic and Movement is published as realizing the exemplary forms of subject content claimed.

26页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:使用虚拟化数据迭代器对神经网络进行数据处理性能增强

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!