The flexible hardware of quantization is removed for the high-throughput vector with dynamic vector length and codebook size

文档序号:1760344 发布日期:2019-11-29 浏览:36次 中文

阅读说明:本技术 用于具有动态向量长度和码本大小的高吞吐量向量去量化的灵活硬件 (The flexible hardware of quantization is removed for the high-throughput vector with dynamic vector length and codebook size ) 是由 A·A·安巴德卡 A·托米克 C·B·麦克布赖德 G·彼得 K·D·塞多拉 L·M·瓦尔 于 2018-04-06 设计创作,主要内容包括:神经网络(NN)和/或深度神经网络(DNN)的性能可以由正执行的操作数目以及NN/DNN的存储器数据管理来限制。使用神经元权重值的向量量化,神经元的数据的处理可以优化操作的数目以及存储器利用以便增强NN/DNN的总体性能。操作地,权重值的一个或多个连续段可以被转换为任意长度的一个或多个向量,并且一个或多个向量中的每个向量可以被分配有索引。所生成的索引可以被存储在示例性向量量化查找表中并且在飞行中在运行时由示例性快速权重查找硬件来取回,作为NN的示例性数据处理功能的一部分,作为内联去量化操作的一部分,以获得所需要的一个或多个神经元权重值。(The performance of neural network (NN) and/or deep neural network (DNN) can be limited by the memory data management of the operation number and NN/DNN just executed.Using the vector quantization of neuron weighted value, the processing of the data of neuron can optimize the number of operation and memory utilizes the overall performance to enhance NN/DNN.Operatively, one or more continuous segments of weighted value can be converted into one or more vectors of random length, and each vector in one or more vectors can be assigned index.Index generated can be stored in exemplary vector quantization look-up table and awing search hardware by exemplary quick weight at runtime to fetch, a part of example data processing function as NN, as inline a part for going quantization operation, to obtain required one or more neuron weighted values.)

1. a kind of system for the enhancing data processing in neural network environment, the system comprises:

At least one processor;

At least one processor component;And

At least one processor, at least one described processor communication, at least one processor, which has, is stored in it On computer-readable instruction, the computer-readable instruction when by least one described processor execute when so that it is described extremely A few processor:

One or more initiation parameters, the initiation parameter are received from the collaborative controller component of the neural network environment Including representing the data of the dimension to the data by the neural network environmental treatment and representing one or more vector quantizations The data of index value, one or more of index values represent one be stored on at least one processor component or Multiple vectors, one or more of vectors include the one or more continuous segments for representing one or more neuron weighted values Data;

It is fetched using one or more of vector quantization index values from at least one processor component and represents one or more One or more of vectors of a neuron weighted value;

The one or more of vectors fetched quantify to fetch one or more neuron weighted values of bottom; And

One or more of neuron weighted values are transmitted for by the one or more of of the neural network environment Manage component processing.

2. system according to claim 1, wherein one or more of vectors, which are stored in, resides in described at least one In Fast Lookup Table on a memory member.

3. system according to claim 2, wherein one or more of vectors have random length.

4. system according to claim 3, wherein the computer-readable instruction also makes at least one described processor One or more of vectors are fetched from one or more rows of the Fast Lookup Table.

5. system according to claim 4, wherein the vector length of one or more of vectors is for the nerve net Each neuronal layers in the neuronal layers of network environment are selectable.

6. system according to claim 5, wherein the computer-readable instruction also makes at least one described processor Execute one of selected one or more neuronal layers in the neuronal layers for the neural network environment or The vector of multiple neuron weighted values goes to quantify.

7. system according to claim 2, wherein the computer-readable instruction further includes that can operate to execute and be stored in The one or more hardware componenies of the vector on the Fast Lookup Table quickly searched.

8. a method of computer implementation, comprising:

One or more initiation parameters, the initiation parameter are received from the collaborative controller component of the neural network environment Including representing the data of the dimension to the data by the neural network environmental treatment and representing one or more vector quantizations The data of index value, one or more of index values represent one be stored on at least one processor component or Multiple vectors, one or more of vectors include the one or more continuous segments for representing one or more neuron weighted values Data, one or more of vectors are generated by the processor of the neural network environment;

It is fetched using one or more of vector quantization index values from at least one processor component and represents one or more One or more of vectors of a neuron weighted value, one or more of vectors are operatively stored in Fast Lookup Table On;

The one or more of vectors fetched quantify to fetch one or more neuron weighted values of bottom; And

One or more of neuron weighted values are transmitted for by the one or more of of the neural network environment Manage component processing.

9. computer implemented method according to claim 8, further include by one of the neural network environment or Multiple cooperation hardware componenies are inline to the one or more of vectors progress fetched to be gone to quantify, one or more to obtain A neuron weighted value.

10. computer implemented method according to claim 8 further includes utilizing for generated one or more Base in the cooperation memory member of a vector indexes to generate one or more Fast Lookup Tables of virtualization.

11. computer implemented method according to claim 8 further includes generating be used for the neural network environment one One or more vectors of a or multiple neuronal layers.

12. computer implemented method according to claim 11 further includes being stored in one or more of vectors In one or more rows of Fast Lookup Table.

13. computer implemented method according to claim 12, further include generate random length it is one or more to Amount.

14. computer implemented method according to claim 8 further includes selection vector length for the nerve net The generation of one or more of vectors of each neuronal layers in the neuronal layers of network environment.

15. computer implemented method according to claim 8 further includes by one or more of vectors generated It is stored in local storage component.

Background technique

In artificial neural network (NN), neuron is substantially single for being modeled to the biological neuron in brain Member.It is interior with the weight vectors that are added to the nonlinear biasing with application that the model of artificial neuron includes input vector Product.For deep neural network (DNN) (for example, as expressed by exemplary DNN module), neuron can be mapped closely together Artificial neuron.

In across NN or DNN processing data, the Exemplary neural member for executing Example processing operations is required to handle greatly For the data of amount to apply various data processing/manipulation operations, it is unfavorable to desired state processing target to influence to cause Crucial potential factor overall NN or DNN performance (for example, the object and/or object in mark exemplary input data are special Property --- image, sound, geographical coordinate etc.).In general, existing NN and DNN spent when executing these various operations it is evitable Handle time (for example, floating per second/fixed point operation (performed flops per second, GFlops/s)) and memory Space (for example, byte number (G byte number per second, GBytes/s) of transmission per second).Particularly, present practice is required by artificial Neuron weighted value is read from cooperation memory member before neuron processing.In general, weighted value can be stored in general deposit In reservoir (such as DRAM) or it is cached in fast local memory (such as SRAM).Using general-purpose storage, Seeking time and power are to read weighted value.Using local storage, high-performance cache memory is expensive, and usually It is that size is limited.Since evitable time/power is required or directly from general-purpose storage or indirectly from this Ground cache memory reads weighted value, thus present practice lacks the processing capacity of optimization NN/DNN completely.

The inefficient conventional method for overcoming present practice is to reduce the precision of weighted data to reduce required storage Tolerance.For example, 32 floating-point weighted values can be reduced to 16 half accuracy values, lead to 50% section in weights memory requirement It saves.The problem of having the big reduction of the precision of weighted value is the reduction of result precision.

More favorable NN/DNN by deployment operation allows more weighted values to indicate in the local storage of specified rate The use of the vector quantization of neuron weighted value, and then reduce and weighted value is loaded into local storage high speed from main memory Tolerance is locally stored required by expense and/or reduction in caching.Particularly, vector quantization process can use look-up table and incite somebody to action Weight code conversion is weighted data.Operatively, by utilizing vector quantization, entire weight agglomerate can be operatively interpreted as can With decoded weight coding during runtime.

More particularly, the vector quantization of weighted value the continuous segment of weighted value can be operatively converted to random length to It measures (for example, 2 weighted values, 4 weighted values etc.) and each vector can be assigned index value.In the nerve for requiring weighted value During the execution of first calculating operation, index be used to quote the specific vector for being used for calculating in look-up table.Due to individually indexing It is used to quote multiple weighted values, thus realize the reduction of storage space, the precision without reducing weighted value.

The disclosure made herein presented considers about these and other considerations.

Summary of the invention

Techniques described herein offer reduces Exemplary neural network using the vector quantization of neuron weighted value (NN) and/or the memory requirement and process cycle of deep neural network (DNN) environment.Systems and methods described herein Aspect is related to machine/artificial intelligence (MI) hardware structure.Such framework and its realization can be referred to as " NN ".In illustrative reality In existing, the use of the vector quantization (VQ) in exemplary NN may cause the increase for reading the formal neuron performance of weighted value. In declarative operation, one or more index, which can be stored in, can use Fast Lookup Table (physics or virtual) expression power One or more vector lines of weight values." weight " is considered when handling one or more data elements by neuron Manage the numerical value of device consumption.The possibility format of weighted value can be that can be symbol or without symbol, byte, integer and/or floating-point Any bit length.Due to index rather than whole weighted datas store, thus the reduction in memory transmission can be by making It is realized with vector quantization.

In illustrative realization, one or more continuous segments of weighted value can operatively be converted to one of random length Or multiple vectors, and each vector in one or more vectors can be with distribution index.Index generated can be stored In exemplary VQ look-up table.In declarative operation, held in the exemplary of neuron calculating operation that can require weighted value Between the departure date, index can be from can indicate that the look-up table generated of the specific vector comprising one or more weighted values is taken It returns.Neuron calculating operation, which is considered, to be executed by one or more neurons (such as to be rolled up according to selected operation Product) it handles input data or is fully connected to generate the one or more of output data and calculate steps.

One or more rows of VQ look-up table can be from cooperation memory member (such as general or local storage portion Part) it is read.VQ look-up table may include N row and M wide, and can by for will index be quickly converted for the cooperation of VQ row it is fast Fast weight is searched hardware (FWLH) and is operatively used.FWLH is considered the hardware logic being present in NN, can operate Promptly to execute the VQ row that weighted value is converted to VQ look-up table.In illustrative realization, line number N can indicate index model It encloses.For example, 12 indexes can be required for 4096 vector lines.The wide M of number as every row weighted value may include Arbitrary value, can include but is not limited to 2 multiple, and 2,4,8 and 16 etc..It, can also be operatively if required in this way Dispose the larger width of VQ look-up table.

In declarative operation, using from VQ look-up table illustrative index value execution fetch vector when, such as to The corresponding weighted value that measuring indicates can be consumed a part of the example data processing function as NN by neuron.

In illustrative realization, VQ look-up table be can store in one or more cooperation hardware componenies of NN, such as be posted Storage, SRAM and DRAM.Such hardware component can use the fixation comprising multiple VQ tables with single base index value Memory block or virtual memory block are realized to select current VQ look-up table.

In another illustrative realization, one or more virtual VQ look-up tables can be defined on the list with basic index value A physics VQ table definition.In illustrative realization, selectable vector length related with vector quantization process described herein Degree can be utilized according to neuronal layers function so that a neuronal layers function can be used with the first VQ length (for example, 2) vector, wherein the vector with the 2nd VQ length (for example, 4) can be used in another neuronal layers function, wherein another mind The vector with the 3rd VQ length (for example, 16) can be used through first layer function.Illustratively, neuronal layers function can be recognized To be the one or more operations executed by one or more layers of Exemplary neural network environment.

In illustrative realization, systems and methods described herein can be deployed as " system on chip ", one of them Or multiple NN are instantiated, so that NN may include the VQ look-up table for weighted value.

In declarative operation, at exemplary run-time, inline vector can be executed and go quantification treatment can with determination It can lead to the maintenance of neuron handling capacity and maintain (underlying) neuron weighted value of the bottom of the optimization performance of NN. In declarative operation, vector quantization as described herein can be activated/disable according to neural net layer.

Operatively, the use of vector quantization may cause many optimization capability operations of NN, including but not limited to: for depositing Storage calculates the reduction of the memory memory requirement of related neuron weighted value with neuroid;When execution neuronal layers function When required bandwidth of memory reduction;The reduction of required time when executing neuronal layers function;And utilize mind High accuracy through first weight Value Data realizes the desired level of performance institute of traditional neural member weighted value memory management technique It is required that local cache memory amount reduction.

Although theme as described above also can be implemented as calculating it should be appreciated that being described relative to system Device, computer procedures, computing system or the product (such as computer-readable medium and/or special chip collection) of machine control. These and various other features will be apparent from the reading of following specific embodiments and the access of associated attached drawing.It provides The content of present invention is to introduce the selection of the concept of the reduced form to be described further below in a specific embodiment.

Purport does not identify the key feature or essential characteristic of claimed theme to the content of present invention, the content of present invention also not purport In the range for claimed theme.In addition, claimed theme be not limited to solve the disclosure any part in it is old The realization for any or all disadvantage stated.

Detailed description of the invention

Specific embodiment is described with reference to the drawings.In the accompanying drawings, leftmost (one or more) number in appended drawing reference The attached drawing that word mark appended drawing reference first appears.Same reference numerals in different attached drawings indicate similar or like item.It is right The appended drawing reference of the letter with alphabetical sequence can be used to refer to each individual item in the reference that multiple individual items are made. The generalized reference of item can be used the certain figures label of not alphabetical sequence.

Fig. 1 illustrates the block figures according to the Exemplary neural network computing environment of systems and methods described herein.

Fig. 2 illustrates the Exemplary neural network environment with co-acting part according to systems and methods described herein Block figure.

Fig. 3, which is illustrated, indicates exemplary according in the mapping of the illustrative logical data of systems and methods described herein The block figure of input data.

Fig. 4 is illustrated to show and can be operated with the illustrative n cunning of one or more lines of across illustrative logical data mapping The block diagram of the exemplary input data indicated in the illustrative logical data mapping of dynamic window used.

Fig. 5, which is illustrated, can operate according to showing for systems and methods described herein with across can operate to allow data to fill out Use as processing enhancing illustrative logical data mapping one or more lines illustrative n sliding window use say The block diagram of the exemplary input data indicated in bright property logical data mapping.

Fig. 6 is can be operated according to showing for systems and methods described herein to execute the vector quantity of neuron weighted value Change/go the block diagram of the interaction of the various parts of the Exemplary neural network environment of quantization.

Fig. 7 is in illustrative neural computing environment for the vector quantity according to required neuron weighted value Change/go the flow chart of the illustrative process of quantification treatment data.

Fig. 8 shows the additional of the illustrative computer architecture of the computer for being able to carry out approach described herein Details.

Fig. 9 shows the additional detail of the illustrative calculating equipment according to systems and methods described herein cooperation.

Specific embodiment

Following specific embodiments describe for using the vector quantization of neuron weighted value and reduce Exemplary neural The technology of the memory requirement and process cycle of network (NN) and/or deep neural network (DNN) environment.System described herein It is related to machine/artificial intelligence (MI) hardware structure in terms of system and method.Such framework and its realization can be referred to as " NN ". In illustrative realization, the use of the vector quantization (VQ) in exemplary NN may cause the formal neuron for reading weighted value The increase of energy.In declarative operation, one or more index, which can be stored in, can use Fast Lookup Table (physics or void It is quasi-) indicate one or more vector lines of weighted value." weight " is considered when handling one or more data elements The numerical value consumed by neuron processor.The possibility format of weighted value can be that can be symbol or without symbol, byte, integer And/or any bit length of floating-point.By using vector quantization, compared with whole weighted datas, index is stored, and may be implemented Reduction in memory transmission.

In illustrative realization, one or more continuous segments of weighted value can operatively be converted to one of random length Or multiple vectors, and each vector in one or more vectors can be with distribution index.Index generated can be stored In exemplary VQ look-up table.In declarative operation, held in the exemplary of neuron calculating operation that can require weighted value Between the departure date, index can be fetched from that can indicate the look-up table generated of the specific vector comprising one or more weighted values. Neuron calculating operation, which is considered, to be executed by one or more neurons according to selected operation (such as convolution) Processing input data is fully connected to generate the one or more of output data and calculate step.

One or more rows of VQ look-up table can be from cooperation memory member (such as general or local storage portion Part) it reads.VQ look-up table may include N row and M wide and can by for will index be quickly converted for the cooperation of VQ row it is quick Weight is searched hardware (FWLH) and is operatively used.FWLH is considered the hardware logic being present in NN, can operate with Promptly execute the VQ row that weighted value is converted to VQ look-up table.In illustrative realization, line number N can indicate index range. For example, 12 indexes can be required for 4096 vector lines.The wide M of number as every row weighted value may include any Value, can include but is not limited to 2 multiple, and 2,4,8 and 16 etc..If required in this way, can also operatively dispose The larger width of VQ look-up table.

In declarative operation, using from VQ look-up table illustrative index value execution fetch vector when, such as to The corresponding weighted value that measuring indicates can be consumed a part of the example data processing function as NN by neuron.

In illustrative realization, VQ look-up table be can store in one or more cooperation hardware componenies of NN, such as be posted Storage, SRAM and DRAM.Such hardware component can use the fixation comprising multiple VQ tables with single base index value Memory block or virtual memory block are realized to select current VQ look-up table.

In another illustrative realization, one or more virtual VQ look-up tables can be defined on the list with basic index value A physics VQ table definition.In illustrative realization, selectable vector length related with vector quantization process described herein Degree can be utilized according to neuronal layers so that a neuronal layers function can be used with the first VQ length (for example, 2) to Amount, wherein the vector with the 2nd VQ length (for example, 4) can be used in another neuronal layers function, wherein another neuronal layers The vector with the 3rd VQ length (for example, 16) can be used in function.In illustrative realization, system as described herein and side Method can be deployed as " system on chip ", wherein one or more NN are instantiated, so that NN may include for weighted value VQ look-up table.Illustratively, neuronal layers function is considered one or more layers by Exemplary neural network environment One or more operations of execution.

In declarative operation, in exemplary operation, inline vector can be executed and go quantification treatment that may lead with determination It causes the maintenance of neuron handling capacity and maintains the bottom neuron weighted value of the optimization performance of NN.In declarative operation, such as Vector quantization described herein can be activated/disable according to neural net layer.

Operatively, the use of vector quantization may cause many optimization capability operations of NN, including but not limited to: for depositing Storage calculates the reduction of the memory memory requirement of related neuron weighted value with neuroid;When execution neuronal layers function When required bandwidth of memory reduction;The reduction of required time when executing neuronal layers function;And utilize mind High accuracy through first weight Value Data realizes that the desired level of performance of traditional neural member weighted value memory management technique is wanted The reduction for the local cache memory amount asked.

It should be appreciated that theme as described above can be implemented as the device of computer control, computer procedures, calculate system System or product (such as computer-readable medium).In addition to many other benefits, technological improvement herein is relative to various each The efficiency of the computing resource of sample.For example, the determination of shift step can reduce many needed for executing many complex tasks Calculating cycle, face recognition, target identification, image generation etc..

In addition, improved human interaction can task in this way more acurrate and the introducing faster completed To realize.In addition, the use that the use of shift step can reduce network flow, reduce power consumption and memory.Except herein Other technologies effect except the technical effect being previously mentioned can also be realized from the embodiment of techniques disclosed herein.

Although should be appreciated that theme as described above also can be implemented as computer control relative to System describe Device, computer procedures, computing system or the product (such as computer-readable medium and/or special chip collection) of system.These It will be apparent with various other features from the reading of following specific embodiments and the access of associated attached drawing.This hair is provided Bright content is to introduce the selection of the concept of the reduced form to be described further below in a specific embodiment.

In artificial neural network, neuron is used to the basic unit to the biological neural Meta Model in brain.People The model of work neuron may include the inner product of input vector with the weight vectors for being added to the nonlinear biasing with application. Comparatively, in exemplary DDN module (for example, 105 of Fig. 1), neuron is mapped closely together artificial neuron.

Illustratively, DNN module is considered superscalar processor.Operatively, one or more can be referred to Enable the multiple execution units for being assigned to referred to as neuron.Execution unit can be " while assigning and being completed at the same time ", wherein each Execution unit with it is every other synchronous.DNN module can be classified as SIMD (single instruction stream, multiple data stream) framework.

The exemplary DNN environment 100 of Fig. 1 is gone to, DNN module 105 has memory sub-system, memory sub-system tool There is unique L1 and L2 cache structure.These not instead of traditional caches, are designed specifically for neural processing. For convenience's sake, these cache structures have taken up the name for reflecting its expected purpose.In an illustrative manner, L2 high speed Caching 150 can illustratively be maintained with the high speed to select frequency (for example, 16 Gigabits per seconds (16Gbps)) to operate The memory capacity (for example, one Mbytes (1MB)) of the selection of privately owned interface.L1 cache can maintain can kernel with The memory capacity (for example, eight kilobytes (8KB)) for the selection divided between activation data.It is slow that L1 cache can be referred to as line Device is rushed, and L2 cache is referred to as BaSRAM.

DNN module can be the neural network only recalled and programmatically support various network structures.It is right In network training can in server zone or data center off-line execution.It is trained the result is that can be referred to as or power The parameter set of weight or kernel.These parameters indicate to can be applied to the transmission function of input, wherein the result is that classification or language The output of justice label.

In declarative operation, DNN module can receive panel data as input.Input is not limited only to image data, As long as the uniform planar format that the data presented can be operated on it with DNN.

DNN module operates in the list of the layer descriptor for the layer for corresponding to neural network.Illustratively, layer descriptor List can be considered as instruction by DNN module.These descriptors can be held into DNN module and in order from memory pre-fetch Row.

In general, may exist the layer descriptor of two main species: 1) the mobile descriptor of memory to memory;With 2) behaviour It is described symbol.The mobile descriptor of memory to memory can be used near/data of autonomous memory are moved to/and be used for certainly The local cache consumed by operation descriptor.The mobile descriptor of memory to memory follows different from operation descriptor Execution pipeline.The target pipeline of descriptor mobile for memory to memory can be internal DMA engine, and operation is retouched The target pipeline for stating symbol can be neuron processing element.Operation descriptor is able to carry out many different layer operations.

The output of DNN is also data chunk.Output can optionally flow to local cache or flow to main memory Reservoir.Since software will allow, thus DNN module can prefetch data in advance.Software can be by using between descriptor Protection and setting interdependence are controlled and are prefetched.Prevent the descriptor progress with interdependence collection until having met correlation.

Turning now to Fig. 1, Exemplary neural network environment 100 may include various co-acting parts, including DNN module 105, Cache memory 125 or 150, low bandwidth infrastructure 110, bridge components 115, high bandwidth structure 120, SOC 130, PCIE " endpoint " 135, Tensilica node 140, Memory Controller 145, LPDDR4 memory 155 and input data source 102.Into One step, as shown, DNN module 105 can also include many components, including prefetch 105 (A), DMA 105 (B), deposit Device interface 105 (D), load storage unit 105 (C), layer controller 105 (D), preservation/recovery component 105 (E) and neuron 105(F).Operatively, exemplary DNN environment 100 can handle data according to the specification of selection, and wherein DNN module is executed such as this One or more functions described in text.

Fig. 2, which is illustrated, can operate using the Exemplary neural of a part using direct line buffer 220 as data processing Network environment 200.As shown, Exemplary neural network environment 200 (is also known as calculating equipment or calculating herein Facility environment) it include one or more operation controllers 235, it cooperates with line buffer 220 to provide for data processing One or more instruction.Line buffer 220 can be operated to pass through external structure 230 and structure 215 from cooperation external memory Component 225 receive data and operation with from (one or more) iterator 240 (for example, based on hardware and/or virtualize iteration Device) receive one or more instructions/commands (for example, from cooperation memory member read data instructions/commands and/or will be from The data of cooperation memory member load are written in the instruction in line buffer).In addition, as shown in Figure 2, Exemplary neural net Network environment can also include that quick weight searches hardware 245 (FWLH), can operatively receive quantization and be received as showing The request of one or more neuron weights of the list of the index of example property code book.In declarative operation, FWLH 245 can be with Neuron weighted data is received from one or more cooperation memory member (210,225) by structure 215.FWLH 245 can be with Processing neuron weight index data, the data for going quantization to receive can operatively be written to line buffer 220 to generate Equal number of vector (that is, code-book entry).

Operatively, line buffer 220 can operate controller 235 (also referred herein as according to from one or more " collaborative controller component 235 ") one or more instructions for receiving are according to the step width mobile datas of selection.In addition, line Buffer 220 can be cooperated with (one or more) processing unit (for example, (one or more) neuron) to provide write-in position Shifted data by structure 215 for being directly or indirectly further processed.Neural network environmental structure can be energy Enough data/address bus across various data.Direct line buffer is considered the finger that can be received according to one or more Enable the memory member for reading and writing data and data element.

In declarative operation, Exemplary neural network environment 200 can be operated according to the process described in Fig. 7 to be located in Manage data.Specific to component described in Fig. 2, these components are merely illustrative, because those of ordinary skill in the art will It is understood that processing described in Fig. 6 and 7 will also be executed by the other component other than component illustrated in Fig. 2.

In addition, as shown in Figure 2, Exemplary neural network environment can optionally include one or more iterators (for example, based on hardware and/or virtualization iterator) (as indicated by dotted line), can illustratively operate and be inputted with iteration Data (not shown) by one or more neuron processors 205 for being handled.It will be understood by the skilled person that due to The exemplary mind operated in the case where no any iterator by the inventive concept that system and methods described herein describes Through operating in network environment 200, thus exemplary one or more iterators is such optional including being merely illustrative.

Fig. 3, which is illustrated, maps 300 for the example logical data of exemplary input data.As indicated, data 305 can be by It is expressed as with some dimension 340 including port number 310, height 315 and width 320 (for example, making the number generally taken Data volume can be defined according to dimension) data.According to systems and methods described herein, data 305 can be assigned and Prepare for being handled by the n neuron 330 that cooperate, so that first part a can be for delivery to peripheral sensory neuron, second part b can For delivery to nervus opticus member etc., until n part is passed to n neuron.

It, can be based on the Cooperation controlling by Exemplary neural network environment (for example, 200 of Fig. 2) in declarative operation One or more instructions that device component provides determine the part of data 305 using n sliding window/kernel 325.Further Ground, as shown, input data part a, b, c and d can be used by Exemplary neural network environment (for example, 200 of Fig. 2) Cooperative operation controller part (235) provide one or more initiation parameters be addressed to physical storage 325.

Fig. 4 illustrates the example logic datagram 400 of exemplary input data (not shown).Example logic datagram 400 include First Line 410 (using slash mark diagram) and the second line 420 (illustrating by a dotted line).Each ground figure line can To include several sliding windows (for example, 430,440 and 450 for First Line 410 and 460,470 and for the second line 420 480).In addition, as shown, logical data Figure 40 0 shows the ability of sliding window with the data dimension side across input data Boundary's (for example, across First Line 410 and the second line 420).Since more data can more efficiently prepare for by cooperation nerve net Network processing component (for example, 205 of Fig. 2) is followed by subsequent processing, thus such ability allows increased performance.

Fig. 5 is similar to Fig. 4 and is presented with the ability of description systems and methods described herein to allow using filling out Fill the performance characteristics for further enhancing Exemplary neural network environment (for example, 200 of 100 and Fig. 2 of Fig. 1).As shown, Logical data Figure 50 0 (unshowned exemplary input data) may include across one or more lines (for example, 510 and 520) Various sliding windows (530,540,550,560,570 and 580).In addition, logical data Figure 50 0 can also include filling 580.

In declarative operation, in the operation of Exemplary neural network environment (the 200 of 100 or Fig. 2 of Fig. 1) at, Filling 580 can be added dynamically.The operation controller 235 of Fig. 2 can specify to be used in input data (for example, bulk) Fig. 3 shown in loading in each dimension 340 in dimension 340 (for example, recognizing the dimension obtained jointly To be data volume), and neural network environment (for example, iterator controller instructs) can operatively construct data volume, seem Filling is physically present in memory.Default value can also be in the iterator output position for wherein adding filling by exemplary Neural network environment (for example, iterator controller instructs) generates.

Fig. 6 is to show to operate to execute the Exemplary neural network rings of the vector quantization of neuron weighted value/go quantization The diagram of the interaction of the various parts in border 600.As shown in Figure 6, Exemplary neural network environment 600 may include exemplary mind Through processor 605 (for example, 100 of Fig. 1).Neuron processor 605 can also include that quick weight searches hardware 610, operation Ground processing weight index data 615 and data from exemplary quick weight look-up table 625 with fetch/go quantization for by The neuron weighted value 620 of 630 consumption of Exemplary neural member.Further, as shown, quick weight look-up table 625 can To include several rows 625 (a), 625 (b) and 625 (c) etc..

In declarative operation, one or more index, which can be stored in, can use the expression power of Fast Lookup Table 625 One or more vector lines (such as 625 (a), 625 (b), 625 (c)) of weight values, Fast Lookup Table can illustratively be physics Hardware list or the virtualization table created with software.In illustrative realization, one or more continuous segments of weighted value can be grasped The each vector be converted to making in one or more vectors of random length and one or more vectors can be assigned rope Draw.Index generated can be stored in exemplary VQ look-up table 625.

One or more rows of VQ look-up table 625 can be from cooperation memory member (such as general or local storage Component) it reads.VQ look-up table may include N row and M wide and can by for will index be quickly converted for the cooperation of VQ row it is fast Fast weight is searched hardware (FWLH) 610 and is operatively used.FWLH 610 is considered the hardware logic being present in NN, It can operate promptly to execute VQ row 625 (a), 625 (b) and 615 (c) that weighted value is converted to VQ look-up table.

In illustrative realization, line number N can indicate index range.For example, 12 can be required for 4096 vector lines The index of position.The wide M of number as every row weighted value may include arbitrary value, can include but is not limited to 2 multiple, all Such as 2,4,8 and 16.If required in this way, the larger width of VQ look-up table can also be operatively disposed.

In declarative operation, using from VQ look-up table illustrative index value execute when fetching of vector, such as by The corresponding weighted value 620 that vector indicates can according to one on the physical storage component 640 for residing in FWLH 610 or Multiple code books are gone to quantify by FWLH 610, and the example data processing function as NN can be consumed by neuron 630 A part.

As shown in Figure 6, diagrammatically, quick weight look-up table 625 can store and can be operated simultaneously by FWLH 610 And has and remove the illustrative dynamic physical of one or more used in quantization for the storage vector in the code book using selection Memory configures the configuration data of the exemplary array of 650,660 and 670 physical storage 640.Code book be represented as by with In the list of the vector of quantized data.The position (index) of each vector in the list can operatively indicate quantization vector. More than one code book can be used to realize it is desired go quantization rate.In an exemplary case, as shown in Figure 6 second In situation 660, using 4 code books and quantization 4 are gone to index simultaneously effectively 16 (the entry * 4 of every row 4 code book) weight term Can go quantization with realize it is desired go quantization rate.

In declarative operation, if such vector quantization/quantification treatment is gone to be activated for is such a or more A process layer, then one or more dynamic physical memory configurations 650,660 and 670 can be by Exemplary neural network environment One or more process layers in process layer use.

In declarative operation, illustrative dynamic physical memory configuration can be by being arranged exemplary configuration register (not shown) configures, and can be the resident component of FWLH 610 to be allowed for the process layer of Exemplary neural network environment One or more of these exemplary physical memories configure one of 650,660 and 670 use.Operatively, goer Reason memory can be used to load on it by FWLH 610 the exemplary code book of a part that quantification treatment is gone as vector.

In illustrative realization, each of cooperation physical storage can maintain the copy of code book.Operatively, when showing When example property code word is loaded into physical storage, the single copy of code book can be from cooperation memory member (for example, DRAM) It is copied to code book memory, and single code book memory data operatively automatically can be written to other cooperations by FWLH In physical storage.

It will be appreciated that although dynamic physical storage address is described as having exemplary digit and entry in Fig. 6, It is the use of expected other alternative digits and entry count to be conceived due to invention as described herein, thus such example is only It is illustrative.

Fig. 7 is the illustrative process for utilizing the vector quantization of neuron weighted value to enhance the performance for NN/DNN environment 700 flow chart.As shown, processing starts at block 705, wherein one or more initiation parameters are from neural network ring The co-acting part (for example, operation controller) in border is received, wherein one or more initiation parameters may include representing to be used for The data of the dimension of the input data of one or more continuous segments including neuron weighted value.Processing then goes to block 710, One or more continuous segments of middle neuron weighted value are converted into one or more vectors of random length and are assigned institute The index value of generation.

At block 715, converted one or more vectors are then stored in one or more of vector quantization look-up table In a row.Processing then goes to block 720, and wherein one or more rows of vector operatively use the index of the generation of step 710 One or more of value is retrieved and goes quantization to obtain bottom neuron weighted value.At block 725, the weight fetched Then value is illustratively consumed by one or more neurons of the Exemplary neural processor component of neural network environment.

Then it is executed at block 730 and checks to determine whether that there are additional input datas to be processed (that is, as iteration A part of operation).If there is no additional input data, then handles and terminate at block 735.However, if additional input number According to iterative operation is required, then then processing returns block 710 and carries out therefrom.

Illustrated computer architecture 800 includes central processing unit 80 (" CPU "), system storage 804, packet in Fig. 8 It includes random access memory 806 (" RAM ") and read-only memory (" ROM ") 808 and memory 804 is coupled to CPU 802 System bus 810.Basic input/output (transmits information between the element in computer architecture 800 comprising helping The basic routine of (all as during start-up)) it is stored in ROM 808.Computer architecture 800 further includes for storing operation The mass-memory unit 812 of system 814, other data and one or more application program.

Mass-memory unit 812 is connected and being connected to the bulk memory controller (not shown) of bus 810 It is connected to CPU 802.Mass-memory unit 812 and its associated computer-readable medium provide non-volatile for framework 800 Storage device.Although the description of the computer-readable medium for including herein refer to mass-memory unit (such as solid state drive, Hard disk or CD-ROM drive), it should be appreciated to those skilled in the art that can be can be by counting for computer-readable medium Calculate any available computer storage medium or communication media that rack structure 800 accesses.

Communication media includes computer readable instructions, data structure, program module or modulated data signal (such as carrier wave Or other transmission mechanisms) in other data, and including any delivery media.Term " modulated data signal " means have There is the signal of one or more characteristics in its characteristic for setting or changing in a manner of encoding information onto the signal.With example Rather than the mode of limitation, communication media includes that wired medium (such as cable network or direct wired connection) and wireless medium are (all Such as acoustics, RF, infrared ray and other wireless mediums).Any of the above combination should also be included in computer-readable medium In range.

By way of example and not limitation, computer storage medium may include (such as computer-readable for information Instruction, data structure, program module or other data) storage any method or technique in the volatibility realized and Fei Yi The property lost, removable and nonremovable medium.For example, computer media include but is not limited to RAM, ROM, EPROM, EEPROM, Flash memory or other solid-state memory technologies, CD-ROM, digital versatile disc (" DVD "), HD-DVD, BLU-RAY or Person other optical storages, cassette, tape, disk storage device or other magnetic storage devices can be used for Storage it is expected information and can be by other medium for the expectation information that computer architecture 800 accesses.For the mesh of claim , phrase " computer storage medium ", " computer readable storage medium " and its modification do not include wave, signal and/or other temporarily State and/or wireless communication medium itself.

According to various technologies, computer architecture 800 be can be used through network 820 and/or another network (not shown) pair The logical connection of remote computer operates in networked environment.Computer architecture 800 can be by being connected to the network of bus 810 Interface unit 816 is connected to network 820.It should be appreciated that Network Interface Unit 816 may be utilized for being connected to it is other kinds of Network and remote computer system.Computer architecture 800 can also include coming from several other equipment for receiving and processing One or more i/o controllers 818 of the input of (including keyboard, mouse or electronic pen (being not shown in fig. 8)). Similarly, i/o controller 818 (can also be not shown to display screen, printer or other kinds of output equipment Output is provided in fig. 8).It is also understood that computing architecture can via by connection of the Network Interface Unit 816 to network 820 So that DNN module 105 can be communicated with environment 100 is calculated.

It should be appreciated that when being loaded into CPU 802 and/or DNN module 105 and executing, software described herein CPU 802 and/or DNN module 105 and overall computer framework 800 are converted to from general-purpose computing system and are customized to promote by component The special-purpose computing system of function presented herein.CPU 802 and/or DNN module 105 can be according to any number of transistors Or other discrete circuit elements and/or chipset construct, and can individually or collectively assume any number of state. More particularly, in response to including the executable instruction in software module disclosed herein, CPU 802 and/or DNN module 105 may be operative to finite state machine.These computer executable instructions can by specified CPU 802 between states how Conversion is to convert CPU 802, thus other discrete hardware elements of conversioning transistor or composition CPU 802.

The physical structure of computer-readable medium presented herein can be converted by encoding software module presented herein. In the different realizations of this description, the particular conversion of physical structure can depend on various factors.The example of such factor can To include but is not limited to the technology for being used to realize computer-readable medium, whether computer-readable medium is characterized as being main or secondary Want storage device etc..For example, if computer-readable medium is implemented as the memory based on semiconductor, it is disclosed herein Software can by convert semiconductor memory physical state coding on a computer-readable medium.For example, software can be with The state of other discrete circuit elements of conversioning transistor, capacitor or composition semiconductor memory.Software can also be converted The physical state of such component is to store data in thereon.

As another example, magnetical or optical technology can be used to realize in computer-readable medium disclosed herein. In such an implementation, when software is coded in wherein, software presented herein can be with switch magnetization or the object of optical medium Reason state.These conversions may include the magnetic properties for changing the specific position in given magnetic medium.These conversions can be with Including changing physical features or the characteristic of the specific position in given optical medium to change the optical characteristics of those positions.Not In the case where the scope and spirit for being detached from this description, other conversions of physical medium be it is possible, wherein aforementioned exemplary is only mentioned It is provided with promoting this discussion.

In view of above, it should be understood that the physical transformation of many types occur in computer architecture 800 to store and Execute component software presented herein.It is also understood that computer architecture 800 may include other kinds of calculating equipment, packet Handheld computer, embedded computer system, personal digital assistant and other kinds of calculating as known in the art is included to set It is standby.It is also contemplated that arriving, it may include being not explicitly shown that computer architecture 800, which can not include all components shown in fig. 8, Other component in Fig. 8, or can use the framework entirely different with framework shown in fig. 8.

Computing system 800 as described above can be deployed as a part of computer network.Generally, for The above description for calculating environment is used to dispose both server computer and client computer in a network environment.

Fig. 9 is illustrated can having via communication network and client meter using device and method described herein The exemplary illustrative networked computing environment 900 of the server of calculation machine communication.As shown in Figure 9, (one or more) server 905 can (it can be fixing line or Wireless LAN, WAN, Intranet, extranet, peer-to-peer network, void via communication network 820 Any one of quasi- dedicated network, internet, bluetooth communication network, proprietary low voltage communication network or other communication networks or Combination) with several clients calculate environment be connected with each other, such as tablet personal computer 910, mobile phone 915, phone 920, (one or more) personal computer 801, personal digital assistant 925, smart phone wrist-watch/individual goal tracker (for example, Apple Watch, Samsung, FitBit etc.) 930 and smart phone 935.In the network environment that communication network 820 is internet In, for example, (one or more) server 905, which can be, can operate via several known protocols (such as hypertext transmission association Discuss (HTTP), File Transfer Protocol (FTP), Simple Object Access Protocol (SOAP) or Wireless Application Protocol (WAP)) in Any one handle and transmit to and from client computing device 801,910,915,920,925,930 and 935 data it is special With calculating environment server.In addition, networked computing environment 900 can use various data security protocols, such as safe socket character Layer (SSL) or pretty good privacy (PGP).Client calculates each of environment 801,810,815,820,825,830 and 835 can To support one or more calculating environment 805 for calculating application or terminal session, such as web browsing equipped with that can operate Device (not shown) or other graphical user interface (not shown) or mobile desktop environment (not shown), with obtain to (one or It is multiple) access of server computing en 905.

(one or more) server 905 can be communicably coupled to other calculate environment (not shown) and receive about Interaction/resource network data of participating user.In declarative operation, user's (not shown) can in (one or more) Client calculates the calculating application interaction environmentally run to obtain desired data and/or calculate application.Data and/or calculating Using can be stored on (one or more) server computing en 905 and cross exemplary communication network 820 on Environment 901,910,915,920,925,930 and 935, which is calculated, by client is transmitted to collaboration user.Participating user's (not shown) It can request the visit of the specific user being placed on (one or more) server computing en 905 in whole or in part and application It asks.These data can calculate environment 801,910,915,920,925,930,935 in client and (one or more) services Device, which calculates, to be transmitted between environment 905 for handling and storing.(one or more) server computing en 905 can be used with trustship In the generation of data and application, certification, encryption and communication calculating application, process and small routine and can be with other servers Calculate environment (not shown), third party service provider (not shown), network attached storage (NAS) and storage area network (SAN) cooperation is to realize application/data transactions.

Example clause

Disclosure presented herein can be considered in view of following clause.

Example clause A, system of the one kind for the enhancing data processing in neural network environment (100), system includes extremely A few processor, at least one processor component and at least one processor at least one processor communication, at least One memory has the computer-readable instruction being stored thereon, when executed by least one processor, so that at least One processor:

One or more initiation parameters are received from the collaborative controller component of neural network environment, initiation parameter includes It represents the data of the dimension to the data by neural network environmental treatment and represents one or more vector quantization index values Data, one or more index values represent the one or more vectors being stored at least one processor component, one or Multiple vectors include the data for representing one or more continuous segments of one or more neuron weighted values, utilize one or more Vector quantization index value (615) fetched from least one processor component represent one of one or more neuron weighted values or Multiple vectors, the one or more vectors for going quantization to be fetched are passed with fetching bottom one or more neuron weighted value One or more neuron weighted values (620) are passed for one or more processing components (630) by neural network environment Reason.

Example clause B, according to the system of example clause A, wherein one or more vectors, which are stored in, resides at least one In Fast Lookup Table on a memory member.

Example clause C, according to system described in example clause A and B, wherein one or more vectors have random length.

Example clause D, according to system described in example clause A to C, wherein computer-readable instruction also makes at least one Processor fetches one or more vectors from one or more rows of Fast Lookup Table.

Example clause E, according to system described in example clause A to D, the vector lengths of wherein one or more vectors for Each of the neuronal layers of neural network environment are selectable.

Example clause F, according to system described in example clause A to E, wherein computer-readable instruction also makes at least one Processor executes one or more of selected one or more neuronal layers in the neuronal layers for neural network environment The vector of a neuron weighted value goes to quantify.

Example clause G, according to system described in example clause A to F, wherein computer-readable instruction further include can operate with Execute the one or more hardware componenies quickly searched to the vector being stored on Fast Lookup Table.

Example clause H, a method of computer implementation, comprising: connect from the collaborative controller component of neural network environment One or more initiation parameters are received, initiation parameter includes the number for representing the dimension to the data by neural network environmental treatment Accordingly and the data of one or more vector quantization index values are represented, one or more index values representatives are stored at least one One or more vectors on memory member, one or more vectors include represent one or more neuron weighted values one The data of a or multiple continuous segments, one or more vectors are generated by the processor of neural network environment, utilize one or more Vector quantization index value fetches the one or more for representing one or more neuron weighted values from least one processor component Vector is stored on Fast Lookup Table to one or more vector operations, go one or more vectors for being fetched of quantization with Bottom one or more neuron weighted value is fetched, and transmits one or more neuron weighted values for by neural network One or more processing components of environment are handled.

Example clause H further includes for by neural network ring according to computer implemented method described in example clause G One or more cooperation hardware componenies in border are inline to the carry out for the one or more vectors fetched to go quantization to obtain one Or multiple neuron weighted values.

Example clause I further includes utilizing for being generated according to computer implemented method described in example clause G and H One or more vectors cooperation memory member in base index to generate one or more Fast Lookup Tables of virtualization.

Example clause J, the computer implemented method according to example G to I further include generating for neural network ring One or more vectors of one or more neuronal layers in border.

Example clause K, according to computer implemented method described in example clause G to J, further include by one or more to Amount is stored in one or more rows of Fast Lookup Table.

Example clause L further includes generating random length according to computer implemented method described in example clause G to K One or more vectors.

Example clause M, according to computer implemented method described in example clause G to L, further include selection vector length with The generation of one or more vectors for each neuronal layers in the neuronal layers of neural network environment.

Example clause N further includes by one generated according to computer implemented method described in example clause G to M Or multiple vectors are stored in local storage component.

Example clause O, a kind of computer readable storage medium with the computer executable instructions being stored thereon, meter Calculation machine executable instruction is when the one or more processors execution by calculating equipment, so that calculating at the one or more of equipment It manages device: receiving one or more initiation parameters from the collaborative controller component of neural network environment, initiation parameter includes generation Table waits for the data of the dimension of the data by neural network environmental treatment and represents the number of one or more vector quantization index values According to, one or more vectors that one or more index values representatives are stored at least one processor component, one or more A vector includes the data for representing one or more continuous segments of one or more neuron weighted values, using one or more to Amount quantization index value fetched from least one processor component represents the one or more of one or more neuron weighted values to Amount, the one or more vectors for going quantization to fetch transmit one to fetch one or more neuron weighted values of bottom Or multiple neuron weighted values (620) are for one or more processing components (630) processing by neural network environment.

Example clause P is set according to computer readable storage medium described in example clause O wherein instructing and also to calculate Standby one or more processors: one or more vectors are stored in one or more Fast Lookup Tables.

Example clause Q also to calculate according to computer readable storage medium described in example clause O and P wherein instructing The one or more processors of equipment: the length of one or more vectors is selected.

Example clause R also to calculate according to computer readable storage medium described in example clause O to Q wherein instructing The one or more processors of equipment: the vector of the neuronal layers of neural network environment is not fetched.

Example clause S also to calculate according to computer readable storage medium described in example clause O to R wherein instructing The one or more processors of equipment: it executes the inline of one or more vectors and goes quantization to fetch the one or more nerves of bottom First weighted value.

Example clause T, according to computer-readable medium described in example clause O to S, wherein memory member and physics are passed Sensor cooperation, physical sensors can generate the input including audio data, video data, tactile sensation data and other data Data are for by one or more collaborative process unit subsequent processings.

Example clause U, according to computer-readable medium described in example clause O to T, wherein collaborative process unit with can grasp Make to receive the human interaction processing input data including audio data, video data, tactile sensation data and other data One or more output physical unit electronically cooperates.

Conclusion

Although finally, being incited somebody to action with each embodiment of language description specific to structural features and or methods of action It is understood that theme defined in appended expression is not necessarily limited to special characteristic as described above or movement.On the contrary, special characteristic or Movement is disclosed as realizing the exemplary forms of claimed theme.

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:训练机器学习模型

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!