Data processing device and artificial intelligence processor

文档序号:303616 发布日期:2021-11-26 浏览:10次 中文

阅读说明:本技术 数据处理装置以及人工智能处理器 (Data processing device and artificial intelligence processor ) 是由 裴京 马骋 王冠睿 施路平 于 2021-08-27 设计创作,主要内容包括:本公开涉及一种数据处理装置以及人工智能处理器。数据处理装置,应用于人工智能处理器的处理核心,该人工智能处理器包括多个处理核心,每个处理核心包括存储模块及数据处理装置,该数据处理装置连接存储模块,该数据处理装置包括:地址产生模块,用于根据控制指令,生成输入地址以及输出地址;数据转换模块,连接到地址产生模块,用于根据输入地址从存储模块读取第一数据,对第一数据执行数据转换操作,得到第二数据,并将第二数据写入存储模块的输出地址。本公开实施例的数据处理装置能够对存储模块中的存储数据进行数据整合。(The present disclosure relates to a data processing apparatus and an artificial intelligence processor. Data processing apparatus, be applied to the processing core of artificial intelligence treater, this artificial intelligence treater includes a plurality of processing cores, and every processing core includes memory module and data processing apparatus, and this data processing apparatus connects memory module, and this data processing apparatus includes: the address generation module is used for generating an input address and an output address according to the control instruction; and the data conversion module is connected to the address generation module and used for reading the first data from the storage module according to the input address, performing data conversion operation on the first data to obtain second data and writing the second data into the output address of the storage module. The data processing device of the embodiment of the disclosure can integrate the data of the storage module.)

1. A data processing apparatus, applied to a processing core of an artificial intelligence processor, wherein the artificial intelligence processor includes a plurality of processing cores, each processing core includes a storage module and a data processing apparatus, the data processing apparatus is connected to the storage module, and the data processing apparatus includes:

the address generation module is used for generating an input address and an output address according to the control instruction;

a data conversion module, connected to the address generation module, for reading first data from the memory module according to the input address, performing a data conversion operation on the first data to obtain second data, and writing the second data into an output address of the memory module,

wherein the data conversion operation comprises at least one of a data merge, a data split, a data migration, and a data type conversion operation.

2. The apparatus of claim 1, wherein the data conversion operation comprises a data merge operation,

the address generation module is used for generating a plurality of input addresses and the output address according to a control instruction;

the data conversion module is used for reading a plurality of first data from the storage module according to a plurality of input addresses and writing second data generated by combining the plurality of first data into the output address of the storage module.

3. The apparatus of claim 1, wherein the data conversion operation comprises a data splitting operation,

the address generation module is used for generating the input address and a plurality of output addresses according to a control instruction;

the data conversion module is configured to read first data from the storage module according to the input address, and write a plurality of second data obtained by splitting the first data into a plurality of output addresses of the storage module, respectively.

4. The apparatus of claim 1, wherein the data transformation operation comprises a data migration operation,

the address generation module is used for generating the input address and the output address according to a control instruction;

the data conversion module is used for reading first data from the storage module according to the input address and writing second data determined according to the first data into the output address of the storage module.

5. The apparatus of claim 1, wherein the data conversion operation comprises a data type conversion operation,

the data conversion module is used for performing data type conversion operation on the first data to obtain second data when the data type of the first data is different from that of the second data,

wherein the data type includes any one of a 32-bit integer int32, an 8-bit integer int8, and a three-valued data type.

6. The apparatus of claim 1, further comprising a line count module,

and the line counting module is used for counting according to the counting pulse sent by the control module of the processing core and sending the control instruction to the address generation module when the counting number is greater than or equal to a line number threshold.

7. The apparatus of claim 1, wherein the address generation module comprises a plurality of sets of counters,

and the plurality of groups of counters are used for generating the input address and the output address and controlling the data conversion module to execute the data conversion operation through counter logic.

8. The apparatus of claim 7, wherein the address generation module comprises a first set of counters, a second set of counters, and a third set of counters,

the first group of counters are used for generating a first address, and the first address is an address of the input data;

the second set of counters to generate a second address;

the third set of counters to generate a third address, the third address being an address of the output data,

wherein, when the data merging operation is executed, the second address is the input address; when a data splitting operation is executed, the second address is the output address; skipping a generation operation of the second address when performing a data migration operation.

9. The apparatus of claim 1, wherein the output address comprises an address of a routing module send data area located in the processing core,

the data conversion module is configured to write the second data into a data sending area of the routing module, so that the routing module sends the second data.

10. An artificial intelligence processor comprising a plurality of processing cores, each processing core comprising a memory module and a data processing apparatus according to any one of claims 1 to 9.

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data processing apparatus and an artificial intelligence processor.

Background

In recent years, the field of neuromorphic computing is rapidly developed, and a neural network can be constructed by adopting a hardware circuit so as to simulate the function of a brain. For example, neuromorphic chips may be utilized to build large-scale, parallel, low-power-consuming, and computing platforms that can support complex pattern learning.

In the related art, the neuromorphic chip includes a plurality of processing cores, and how to implement data integration in the processing cores has become a problem to be solved.

Disclosure of Invention

In view of this, the present disclosure provides a data processing apparatus and an artificial intelligence processor.

According to an aspect of the present disclosure, there is provided a data processing apparatus applied to a processing core of an artificial intelligence processor, the artificial intelligence processor including a plurality of processing cores, each processing core including a memory module and a data processing apparatus, the data processing apparatus being connected to the memory module, the data processing apparatus including:

the address generation module is used for generating an input address and an output address according to the control instruction;

and the data conversion module is connected to the address generation module and used for reading first data from the storage module according to the input address, performing data conversion operation on the first data to obtain second data, and writing the second data into an output address of the storage module, wherein the data conversion operation comprises at least one of data merging operation, data splitting operation, data migration operation and data type conversion operation.

In one possible implementation, the data conversion operation includes a data merge operation,

the address generation module is used for generating a plurality of input addresses and the output address according to a control instruction;

the data conversion module is used for reading a plurality of first data from the storage module according to a plurality of input addresses and writing second data generated by combining the plurality of first data into the output address of the storage module.

In one possible implementation, the data conversion operation includes a data splitting operation,

the address generation module is used for generating the input address and a plurality of output addresses according to a control instruction;

the data conversion module is configured to read first data from the storage module according to the input address, and write a plurality of second data obtained by splitting the first data into a plurality of output addresses of the storage module, respectively.

In one possible implementation, the data transformation operation includes a data migration operation,

the address generation module is used for generating the input address and the output address according to a control instruction;

the data conversion module is used for reading first data from the storage module according to the input address and writing second data determined according to the first data into the output address of the storage module.

In one possible implementation, the data conversion operation comprises a data type conversion operation,

the data conversion module is used for performing data type conversion operation on the first data to obtain second data when the data type of the first data is different from that of the second data,

wherein the data type includes any one of a 32-bit integer int32, an 8-bit integer int8, and a three-valued data type.

In one possible implementation, the apparatus further comprises a line count module,

and the line counting module is used for counting according to the counting pulse sent by the control module of the processing core and sending the control instruction to the address generation module when the counting number is greater than or equal to a line number threshold.

In one possible implementation, the control instruction includes a primitive instruction sent by a control module of the processing core.

In a possible implementation manner, the address generation module includes a plurality of sets of counters, and the plurality of sets of counters are respectively used for generating the input address and the output address, and controlling the data conversion module to execute the data conversion operation through counter logic.

In one possible implementation, the address generation module includes a first set of counters, a second set of counters, and a third set of counters,

the first group of counters are used for generating a first address, and the first address is an address of the input data;

the second set of counters to generate a second address;

the third set of counters to generate a third address, the third address being an address of the output data,

wherein, when the data merging operation is executed, the second address is the input address; when a data splitting operation is executed, the second address is the output address; skipping a generation operation of the second address when performing a data migration operation.

In one possible implementation, the output address includes an address of a routing module sending data area located in the processing core,

the data conversion module is configured to write second data into the data sending area of the routing module, so that the routing module sends the second data.

In one possible implementation, the data type includes at least one of a 32-bit integer int32, an 8-bit integer int8, and a three-valued data type.

According to another aspect of the present disclosure, there is provided an artificial intelligence processor comprising the above-described processing core.

According to the embodiment of the disclosure, an input address and an output address are generated by an address generation module according to a control instruction; the data conversion module reads first data from the storage module according to the input address, performs data conversion operation on the first data to obtain second data, and writes the second data into the output address of the storage module, so that data integration processing on the storage data in the storage module can be realized.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 illustrates a block diagram of an artificial intelligence processor in accordance with an embodiment of the disclosure.

Fig. 2 shows a block diagram of a data processing apparatus of an embodiment of the present disclosure.

Fig. 3 shows a schematic diagram of storing characteristic diagram data in a storage module according to an embodiment of the present disclosure.

Fig. 4 shows a schematic diagram of a data processing apparatus of an embodiment of the present disclosure.

FIG. 5 illustrates a schematic diagram of a data conversion operation of an embodiment of the present disclosure.

FIG. 6 illustrates a schematic diagram of a data merge operation of an embodiment of the present disclosure.

Fig. 7 illustrates a schematic diagram of merging a plurality of first data into second data according to an embodiment of the disclosure.

FIG. 8 illustrates a schematic diagram of a data splitting operation of an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

The neuromorphic chip is a chip with a novel framework and can simulate the working mechanism of the human brain. The neuromorphic chip may be organized according to a processing core. For example, a neuromorphic chip may include 4096 (e.g., a 64 x 64 array) processing cores, each of which may internally simulate 256 neurons in biological sense, such that a neuromorphic chip may simulate a total of about 100 million neurons.

In one possible implementation, the processing core may organize the neurons in a two-dimensional mesh structure. One neuron can comprise axons, dendrites, cell bodies and the like, and the axons, dendrites and cell bodies of the neuron can be realized by hardware circuits. The axons may be rows of a two-dimensional mesh, the dendrites may be columns of a two-dimensional mesh, and the synapses may be cross-points of rows and columns for storing weights. The axons may be used for sending data, the dendrites may be used for receiving data, and the synapses may be used for storing weights, where the weights may be weighted values of a former neuron before passing through synapses by a latter neuron, and correspond to weight parameters of the neural network. For example, when a neuron is not used, the weight of the neuron may be set to 0, and when a calculation using a neuron is required, a weight value may be assigned to the neuron. The cell body can be used for realizing the arrangement of data in the storage module, and the data arrangement method comprises the following steps: and performing data migration Move, data merging Merge and data splitting Split operation, wherein the cell body can also plan a corresponding path to route data, and finally the data reach a target processing core.

FIG. 1 illustrates a block diagram of an artificial intelligence processor in accordance with an embodiment of the disclosure. As shown in fig. 1, the artificial intelligence processor 100 includes a plurality of processing cores 101, and each processing core 101 may include a memory module, an arithmetic module, a data processing device, and the like.

The storage module may be used for storing various data, for example, pixel data of an image and weight data of a convolution kernel, and the storage module may include an individual storage module or a storage module located inside another module, for example, a storage module located inside a routing module; the operation module may include a multiplier accumulator MAC array for performing operations based on the pixel data and the weight data. The data processing device can be used for performing data arrangement on the storage data of the storage module, such as data migration, data merging, data comparison and the like.

The present disclosure provides a data processing apparatus, which may be used to implement data sorting on storage data of a storage module.

Fig. 2 shows a block diagram of a data processing apparatus of an embodiment of the present disclosure. As shown in fig. 2, the data processing apparatus 60 may be applied to a processing core of an artificial intelligence processor including a plurality of processing cores, each of which includes a memory module 40, a control module 50, and a data processing apparatus 60 connecting the memory module and the control module, the data processing apparatus including:

the address generating module 10 is configured to generate an input address and an output address according to the control instruction;

and a data conversion module 20, connected to the address generation module, configured to read first data from the storage module according to the input address, perform a data conversion operation on the first data to obtain second data, and write the second data into an output address of the storage module, where the data conversion operation includes at least one of a data merge operation, a data split operation, a data migration operation, and a data type conversion operation.

For example, the data conversion operation may be a merging operation performed on a plurality of first data, a splitting operation performed on the first data, a migration operation performed on the first data, and the like, wherein during the above operation, a conversion operation of a data type may also be performed, for example, the first data with the data type of 32-bit integer int32 or the data type of the intermediate conversion data is converted into 8-bit integer int 8. The first data may be one or more, the second data may be one or more, and the present disclosure does not limit the type of data conversion operation, the number of the first data and the second data.

Through the data processing device of the embodiment of the disclosure, data arrangement of the storage data of the storage module can be realized.

Wherein the memory module of the processing core may be configured to store data and instructions related to neural network computations. In one example, the storage module may be a memory having a certain storage capacity, and stores different kinds of data such as vectors, matrices, and tensors in the neural network calculation.

In one possible implementation manner, the storage module stores Feature Map (Feature Map) data, where the Feature Map data is stored in units of vectors, and each vector contains data in a Channel direction, which may be referred to as a depth of the vector. The Channel data can be aligned according to the 16Byte width, the number of the units of the storage module occupied by the Channel data can be called as the length of a unit vector, and the feature map data can be composed of the number of vectors and the length of each vector.

Fig. 3 shows a schematic diagram of storing characteristic diagram data in a storage module according to an embodiment of the present disclosure. As shown in fig. 3, the feature map data is width × height × channel, where width and height are 3 as an example, and the number of vectors included in the feature map is 9. The data in each vector channel direction are aligned according to the 16Byte width, resulting in a vector length of 3. As shown in fig. 3, the storage sequence of the feature map data in the memory may be: and storing the data in sequence according to the Channel direction, the row direction of the vector in the characteristic diagram data and the column direction of the vector in the characteristic diagram data.

It should be understood that the storage sequence of the feature map data in the storage module may include various forms, and the present disclosure does not limit the type of data processed by the data processing apparatus, the storage form of the data in the storage module. For ease of understanding, the present disclosure will be described with an example of performing a data conversion operation on feature map data in the form of vectors stored in a storage module.

In one possible implementation, the data transformation operation includes at least one of a data merge, a data split, a data migration, and a data type transformation operation.

In one possible implementation, the data conversion operation includes a data merge operation,

the address generation module is used for generating a plurality of input addresses and the output address according to a control instruction;

the data conversion module is used for reading a plurality of first data from the storage module according to a plurality of input addresses and writing second data generated by combining the plurality of first data into the output address of the storage module.

In one possible implementation, the data conversion operation includes a data splitting operation,

the address generation module is used for generating the input address and a plurality of output addresses according to a control instruction;

the data conversion module is configured to read first data from the storage module according to the input address, and write a plurality of second data obtained by splitting the first data into a plurality of output addresses of the storage module, respectively.

In one possible implementation, the data transformation operation includes a data migration operation,

the address generation module is used for generating the input address and the output address according to a control instruction;

the data conversion module is used for reading first data from the storage module according to the input address and writing second data determined according to the first data into the output address of the storage module.

In one possible implementation, the data conversion operation comprises a data type conversion operation,

the data conversion module is used for performing data type conversion operation on the first data to obtain second data when the data type of the first data is different from that of the second data,

wherein the data type includes any one of a 32-bit integer int32, an 8-bit integer int8, and a three-valued data type.

For example, the data merging operation may be understood as generating the second data from a plurality of first data merges, e.g., a merging of two vectors in the depth direction may be implemented. The data splitting operation may be understood as splitting into a plurality of second data from the first data, e.g. a depth-wise splitting of a vector may be achieved. The data migration operation may be understood as converting first data corresponding to an input address in the storage module into second data, and writing the second data into an output address of the storage module, where the conversion of the first data into the second data may be determining the first data as the second data, or performing data type conversion on the first data to obtain the converted second data. In this way, a vector circular migration operation of the storage space can be realized.

The data type conversion operation may refer to converting the data type, for example, the target data type of the second data to be written into the memory is int8, the data type of the first data is int32, and the data type may be converted from int32 to int 8. When performing the data conversion operation, at least one of data merging, data splitting, data migration, and data type conversion operation may be included, for example, the data merging and data splitting operation may be simultaneously performed, and the data merging and data type conversion operation may also be simultaneously performed, which is not limited in this disclosure.

By the method, various operations such as data merging, data splitting, data migration, data type conversion and the like can be performed on the data in the storage module, so that the artificial intelligence processor can realize various data transformations in the neural network, for example, operations such as partial summation, image framing, data space migration, zero clearing execution of the storage space and the like.

The control instruction may be used to determine a category of the data conversion operation and an address generation manner corresponding to the data conversion operation, for example, different data conversion operations, and the number of corresponding input addresses and the number of corresponding output addresses may be different. The control instruction may also be used to determine address start and address end positions of the input address and the output address, or storage areas of the input address and the output address, and the like.

For example, when the data conversion operation is a data merge operation, the input address may include a first address and a second address, and the output address may include a third address. When the data conversion operation is a data splitting operation, the input address may include a first address, and the output address may include a second address and a third address. When the data conversion operation is a data migration operation, generation of the second address may be skipped, the input address may include the first address, and the output address may include the third address.

It should be understood that when the data conversion operation is a data merge operation, the input address may include more than two addresses, and is not limited to the first address and the second address, and when the data conversion operation is a data split operation, the output address may also include more than two addresses.

For ease of understanding, it will be assumed that when the data conversion operation is a data merge operation, the input address includes a first address and a second address, and the output address includes a third address. When the data conversion operation is a data splitting operation, the input address includes a first address, and the output address includes a second address and a third address. When the data conversion operation is a data migration operation, an example will be described in which the input address includes a first address and the output address includes a third address.

The control instruction may be from a control module connected to the data processing apparatus, or may be a control instruction generated by the data processing apparatus when a preset condition is met, which is not limited in this disclosure.

In some optional embodiments, the control instruction may be a Primitive instruction sent by the control module, for example, a Primitive instruction from a Primitive Instruction (PI) register in the control module.

In some alternative embodiments, the control instruction may be a control instruction generated by a line count module in the data processing apparatus.

Fig. 4 shows a schematic diagram of a data processing apparatus of an embodiment of the present disclosure. In some alternative embodiments, as shown in fig. 4, the data processing apparatus further comprises a line count module,

and the line counting module is used for counting according to the counting pulse sent by the control module and sending the control instruction to the address generation module when the counting times are greater than or equal to a line number threshold value.

As shown in fig. 4, the line counting module may be a line counter.

FIG. 5 illustrates a schematic diagram of a data conversion operation of an embodiment of the present disclosure. As shown in fig. 5, a primitive instruction register of the control module sends a count pulse to the line count module, the line count module counts according to the received count pulse, and sends a start control instruction to the address generation module when the count number is greater than the line number threshold, and the address generation module generates an input address and an output address in response to the control instruction.

Through the row counting module, row pipelining operation of data conversion can be completed. Wherein, the line flow operation can be understood as: when the input vector storage module receives data of 1 line of feature diagram, the primitive instruction register sends 1 counting pulse. The line counting module triggers the counter to count every time the line counting module receives 1 counting pulse, when the line number of the counter is larger than or equal to a preset line number threshold value, the line counting module sends a control signal started by the address generating module, and the address generating module can execute data conversion operation according to the control signal.

In a possible implementation manner, the address generation module includes a plurality of sets of counters, and the counters are respectively used for generating an input address and an output address, and controlling the data conversion module to complete a data conversion operation through the counters. Wherein the counter may be cycled according to the number of vectors and the length of the vectors to generate the input address and the output address.

For example, the address generation module of the disclosed embodiments may include three sets of counters. A first set of counters for generating a first address Addr _ in, a second set of counters for generating a second address Addr _ ciso, and a third set of counters for generating a third address Addr _ out.

In each type of data conversion operation, the first address Addr _ in may be used as an input address, and the third address Addr _ out may be used as an output address. The second address Addr _ ciso can be used for an input address or an output address, respectively, depending on the type of data transfer operation. For example, when a data merge operation is performed, the second address Addr _ ciso may be taken as an input address; when a data splitting operation is performed, the second address Addr _ ciso may be taken as an output address. When the data migration operation is performed, the generation of the second address Addr _ ciso may be skipped.

Therefore, various data of the storage module can be conveniently sorted through the three groups of counters of the address generation module. It should be understood that the form of the address generation module is not limited in the embodiments of the present disclosure, and the number of counters of the address generation module is not limited as long as at least one data conversion operation of data merging, data splitting, data migration, and data type conversion can be implemented.

In a possible implementation manner, the data conversion module is further configured to perform a data type conversion operation during the data conversion operation when the data type of the first data is different from the data type of the second data to be output.

Through the data conversion module, data merging, data splitting and data migration operations with data type conversion can be processed more efficiently.

In some optional embodiments, the data conversion module is configured to perform data type conversion on the first data according to a data type of second data, and perform data conversion operation according to third data after the data type conversion, so as to obtain the second data.

For example, the data conversion module may convert the first data into third data of the same type as the second data to be output according to the data type conversion rule, and perform operations such as data merging, data splitting, or data migration on the third data to obtain the second data.

In some optional embodiments, the data conversion module is further configured to perform a data conversion operation on the first data to obtain third data, and convert a data type of the third data into a type of the second data, which is not limited in this disclosure.

In a possible implementation manner, the data conversion module may be used for conversion operations among three data types, i.e., int32, int8 and three-valued data storage. For example, int32 data may be converted to three-valued data, int8 data may be converted to three-valued data, and int32 data may be converted to int8 data.

Wherein, the output pulse sequence of the neuron in the impulse neural network can be coded according to three-value data. For example, (0, 1 and-1) may be used to describe the state of the neurons of the spiking neural network in non-active, active and inhibited states, respectively, and the three-valued data may be of different kinds. It should be understood that the data conversion module may also convert the data into all-0 data, etc., and the present disclosure does not limit the conversion type of the data conversion module.

In one possible implementation, the data conversion module may include a plurality of lanes and a data strobe (Multiplexer), as shown in fig. 4. The paths may be int32 converted to ternary, int8 converted to ternary, int32 converted to int8, pass-through, and "0" of 16B, respectively.

Therefore, the data conversion module can support three data type conversions when the data types of the input data and the output data are different. For example, when the data types of the input data and the output data are the same, the through path may be selected, and when the clear operation for the output address is performed, the "0" path of 16B may be selected so that the second data written to the output address is all 0 data. The present disclosure does not limit the number and types of paths included in the data conversion module.

For convenience of understanding, the following description will take three sets of counters of the address generation module, and the three sets of counters can respectively generate the first address, the second address, and the third address according to the control instruction as an example, and how the data processing apparatus according to the embodiment of the disclosure performs the data merging operation, the data splitting operation, and the data migration operation when the data types of the input data and the output data are the same and the data types of the input data and the output data are different will be described.

In one possible implementation, the data conversion operation is a data merge operation, the input address includes a first address and a second address, the output address includes a third address,

the data conversion module 20 is configured to read a plurality of first data from the storage module according to the first address and the second address, respectively, and write second data generated by merging the plurality of first data into a third address of the storage module.

When the data types of the first data and the second data are the same, the data gate of the data conversion module may select a pass-through path and perform a merge operation on the plurality of first data, as described above.

For example, the first data corresponding to the first address and the first data corresponding to the second address may be understood as input data, and the data obtained by combining a plurality of first data may be understood as output data. Take the first data corresponding to the first address as in, the first data corresponding to the second address as ciso, and the combined second data as out as an example.

The first data in and the first data ciso may be input vectors with the same number of vectors and the same type of vector data. Wherein the vector lengths of the first data in and the first data ciso may not be the same.

In one possible implementation manner, first data corresponding to a first address and first data corresponding to a second address may be spliced into an output vector, where in the spliced second data, a vector element of the first data corresponding to the first address is located before a vector element of the first data corresponding to the second address.

For example, in the data merging process, the vector of in may be integrated into the output vector, followed by the integration of the ciso vector. For example, in, ciso, and out are three sets of vectors, respectively, the smallest unit in the set of vectors is a vector, and the mathematical representation can be as follows:

in={in00,in01,in02};ciso={ciso00,ciso01,ciso02};

the second data obtained by performing the merge operation on the plurality of first data may be:

out={out0,out1,out2}

wherein: out0 ═ in00, ciso00, out1 ═ in01, ciso01, out2 ═ in02, ciso 02;

among them, the second data out { { in00, ciso00}, { in01, ciso11}, { in02, ciso12} }.

It should be noted that the above mathematical expressions are exemplary, and the present disclosure does not limit the merging manner.

In some optional embodiments, the data conversion module 20 is configured to perform a 0-complementing operation on the remaining storage space when the vector quantity of the second data is greater than the sum of the vector quantities of the plurality of first data and/or the vector length of the second data is greater than the sum of the vector lengths of the plurality of first data.

FIG. 6 illustrates a schematic diagram of a data merge operation of an embodiment of the present disclosure. The number of the first data in is: num _ in, vector length: km _ num _ in; the number of the first data ciso is: num _ ciso, vector length: km _ num _ ciso; the number of the second data out is num _ out, the vector length: km _ num _ out.

As shown in fig. 6, when the data merge operation is performed, the total vector length of the second data out is 15, the sum of the total vector lengths of the first data in and the first data ciso is 12, the 0 complement operation may be performed on the remaining memory space, and the second data out is generated and written in the third address according to Y0 merged by X0 of the first data in and X0 of the first data ciso, Y1 merged by X1 of the first data in and X1 merged by X2 of the first data ciso, and Y2 merged by X2 of the first data in and 0 complement.

In some alternative embodiments, the first data is not the same data type as the second data.

For example, the precision setting may be achieved by setting different truncation bits. For example, the data type conversion may be realized by setting a parameter in _ cut _ start bit. It should be understood that in the data type conversion process, the vector length may change, for example, when data with data type int32 and vector length 4 is converted into a vector with data type int8, the vector length may change from 4 to 1. The consecutive bits cut forward or backward from the target bit of the first data with the data type int32 may be determined as the first data of int8, or the randomly selected bits may be determined as the first data of int8, which is not limited in this disclosure.

Fig. 7 illustrates a schematic diagram of merging a plurality of first data into second data according to an embodiment of the disclosure.

As shown in fig. 7, X0 of the first data in is int32 type data, which can be converted into int8 vector, and the vector length will be changed from 4 to 1. Similarly, X0 of the first data ciso is converted into a vector of int8, the length of the vector is changed from 4 to 1, and the vectors with the lengths of 1 are merged to obtain Y0 of the second data out. As mentioned above, the 0-complementing operation may be performed during the data conversion process, which is not described herein again.

In one possible implementation, the data conversion operation is a data splitting operation, the input address includes a first address, the output address includes a second address and a third address,

the data conversion module 20 is configured to read first data from the storage module according to the first address, and write a plurality of second data obtained by splitting the first data into a second address and a third address of the storage module, respectively.

As described above, the first data in represents input data, the second data out represents first output data, and the first data ciso represents second output data, and when data is split, the data can be split in the order of the first data ciso and the second data out, and if the vector number of the first data in is smaller than the sum of the vector numbers of the second data out and the first data ciso, and/or if the vector length of the first data in is smaller than the sum of the vector lengths of the second data out and the first data ciso, the data can be preferentially split into the second data out.

For example, in, ciso, and out are three sets of vector sets, respectively, the minimum unit within a vector set is the vector length, and the mathematical representation can be expressed as follows:

in={in00,in01,in02};

the vector split result can be expressed as:

out={out0,out1,out2};ciso={ciso00,ciso01,ciso02};

wherein: in00 { out0, ciso00}, in01 { out1, ciso01}, in02 { out2, ciso02}

It should be noted that the above mathematical expressions are exemplary illustrations, and the present disclosure does not limit the manner and form of data splitting. As mentioned above, when it is determined that the remaining storage space exists, the 0 complementing operation may be performed, which is not described herein again.

FIG. 8 illustrates a schematic diagram of a data splitting operation of an embodiment of the present disclosure.

As shown in fig. 8, at the time of data splitting, a data splitting operation is performed on the first data in accordance with the vector number and the vector length of the second data out and the vector length and the vector data of the first data ciso 0. As shown in fig. 8, the vector length of the second data out is 2, the vector length of the first data ciso0 is 1, X0 of the first data in can be split into out0 (vector length is 2) and ciso0 (vector length is 1), and out0 is written to the third address and ciso0 is written to the second address.

When the data types of the second data and the first data are different, the data type conversion operation for performing the data splitting operation may refer to the data type conversion operation for performing the data merging operation, which is not described herein again.

In one possible implementation, the data translation operation is a data migration operation, the input address includes a first address, the output address includes a third address,

the data conversion module 20 is configured to read first data from the storage module according to the first address, and write second data determined according to the first data into a third address of the storage module.

The first data may be determined as the second data, or data obtained by performing data type conversion on the first data may be determined as the second data.

For example, when the data conversion operation is a data migration operation (e.g., the number or length of the second address in the control instruction is 0), the address generation module skips generation of the second address to generate the first address and the third address.

In this way, direct data migration operation can be realized, data migration operation with data type conversion can also be realized, the first data is converted into the data type through the data conversion module and is written into the third address as the second data, and the data migration operation is realized.

In one possible implementation, the data processing apparatus, the storage module, and the control module may be located within a processing core of an artificial intelligence processor. The artificial processor may be a neuromorphic chip and the processing core may be a functional core of the neuromorphic chip. The processing core may also include other modules, such as a computation module, a routing module, and the like.

In one possible implementation, the output address includes an address of a routing module sending data area located in the processing core,

the data conversion module is configured to write second data into the data sending area of the routing module, so that the routing module sends the second data.

In this way, the routing module can be directly started to send data.

In one possible implementation, the data processing device can be suitable for the calculation of an artificial neural network, the calculation of a spiking neural network, or the calculation of a hybrid neural network including an artificial neural network and a spiking neural network.

The present disclosure also provides a processing core, the processing core includes the control module, the storage module and the data processing apparatus, the data processing apparatus includes:

the address generation module is used for generating an input address and an output address according to the control instruction;

and the data conversion module is connected to the address generation module and used for reading first data from the storage module according to the input address, performing data conversion operation on the first data to obtain second data and writing the second data into an output address of the storage module.

In a possible implementation manner, the processing core further includes a routing module, the routing module includes a data sending area, and the routing module is configured to send second data of the data sending area, where the second data may be second data written by the data processing module.

In a possible implementation, the control module may include a PI (Primitive Instruction) register for controlling the memory module and/or the computation module. A primitive may be a program segment consisting of several instructions to implement a particular function. Specifically, the PI register may be a 32-byte register. The PI register can control the number of groups operated by the multiply accumulator array and the number of the multiply accumulators used by each group, for example, for the multiply accumulator array which is divided into 4 groups, each group comprises 32 multiply accumulators, a certain storage space can be distributed in the PI register, and only part of the multiply accumulators work when the neural network calculation is executed, so that the energy consumption of a neural form chip can be reduced, the calculation time of the neural network is shortened, and the calculation efficiency of the neural network is improved.

In one possible implementation, the PI register may be used to control a memory module. For example, the PI register may store an instruction for prohibiting read/write operations on a storage area of a part of the storage module, and control read/write access to the part of the data, so that data loss caused by misoperation on the protected data in the process of performing neural network calculation can be avoided. The PI register may also be configured with a specific storage space for distinguishing different calculation modes, such as convolution, vector accumulation, vector dot multiplication, tensor scaling, and the like. For example, when the data processing apparatus performs neural network calculations, certain bits in the PI register may be used to indicate a pattern in which the multiply-accumulator array performs convolution operations, or to indicate a pattern in which the multiply-accumulator array performs dot-multiply operations. The PI register can also pre-store parameters such as step length, convolution kernel size and the like used by the calculation mode, so that the address generation module can directly take out the pre-stored parameters from the PI register for circular calculation without repeated reading and writing and can quickly generate addresses of different types of data.

In one possible implementation, the control module may be configured to control the storage module and/or the calculation module. For example, the control module may be used to control when to read and write the memory module, and may also be used to control which multiply-accumulate devices of the computation module work for different computation modes and storage modes. In one example, the control module may include a 128-bit wide register. The calculation modes can include neural network calculation modes such as convolution, vector accumulation, average pooling, vector point multiplication and full connection.

In a possible implementation manner, the calculation module comprises a multiply-accumulator array, and the control module is further used for controlling the storage module and/or the calculation module; the control module controls the calculation module, and comprises: and controlling the number of groups of multiply-accumulate devices participating in data calculation and the number of multiply-accumulate devices in each group in the calculation module according to a preset processing category so as to enable the calculation module to calculate data.

The present disclosure also provides an artificial intelligence processor, the artificial intelligence processor includes a plurality of processing cores, each processing core includes a storage module, a control module and a data processing apparatus, the data processing apparatus is connected respectively the storage module with the control module, the data processing apparatus includes:

the address generation module is used for generating an input address and an output address according to the control instruction;

and the data conversion module is connected to the address generation module and used for reading first data from the storage module according to the input address, performing data conversion operation on the first data to obtain second data and writing the second data into an output address of the storage module.

In a possible implementation manner, the processing core may further include a storage module and a computation module, that is, the neuromorphic chip may adopt a processing architecture integrating computation. Wherein the storage module may be configured to store data and instructions related to neural network calculations, and the calculation module may be configured to perform neural network calculations (e.g., multiply-accumulate operations). In one example, the storage module may be a memory with a certain storage capacity, and stores different kinds of data such as vectors, matrices, and tensors in the neural network calculation, and the calculation module may be a Multiply accumulator array (MACs) composed of 128 Multiply Accumulators. The multiply-accumulator array may be divided into 4 groups, each group comprising 32 multiply-accumulators, for performing multiply-add type operations on the output data of the data processing apparatus, and may return the operation results to the data processing apparatus for further processing.

In a possible implementation manner, the processing core may perform artificial neural network calculation, impulse neural network calculation, or hybrid neural network calculation in which an artificial neural network and an impulse neural network are mixed. Different neural network computational tasks need to process data of different precisions. For example, for impulse neural networks, temporal information is an important consideration. A neuron of the spiking neural network is activated only when its membrane potential reaches a certain value, and when the neuron is activated, a signal is generated and transmitted to other neurons to raise or lower their membrane potentials. It should be understood that before the three-valued data are sent to the calculation module for calculation, the three-valued data need to be converted into data with an acceptable precision for the calculation module to perform calculation.

In a possible implementation manner, the neuromorphic chip disclosed by the disclosure can adopt a chip architecture integrating storage and computation, the processor, the storage module and the communication component are integrated together, and information processing is performed locally, so that the requirement of large-scale parallel computation of a neural network can be better met. The neuromorphic chip disclosed by the invention can support the calculation of two calculation paradigms, not only can support the calculation of an artificial neural network, but also can support the calculation of a pulse neural network and the calculation of a mixed neural network in which the artificial neural network and the pulse neural network are mixed. In this way, the neuromorphic chip can compute neural networks in a distributed, large-scale, and parallel manner with extremely low power consumption. It will be understood by those skilled in the art that the neuromorphic chip and the processing core are exemplary. The disclosure is not limited to what processor architecture the data processing apparatus is applied to and whether the processors are organized in terms of processing cores.

In one possible implementation, different processing cores may process different neural network computing tasks, and the same processing core may also perform different neural network computing tasks in different work cycles. For example, during a certain duty cycle, processing core a may be used to process neural network computational tasks on images, and processing core B may simultaneously process neural network computational tasks on text. Data of different neural network computing tasks can be distributed to one processing core for processing, and can also be distributed to different processing cores for processing. Therefore, data exchange between and within the processing cores is frequent for different kinds of data with different accuracies, and a suitable solution for efficiently scheduling the different kinds of data with different accuracies needs to be found.

The traditional chip adopts a von Neumann architecture to separate a storage module from a processor, the processor repeatedly exchanges information with the storage module through a bus, the read-write efficiency of data of different types and different precisions is low, and the bottleneck of computing power and energy consumption exists. Meanwhile, the existing neuromorphic chip is decentralized for processing different types and precision of data, that is, different processing methods are adopted for different calculation tasks and data types. Existing neuromorphic chips can only support one of the computational paradigms of artificial neural network computation or impulse neural network computation, such as neuromorphic chips that employ a Leaky-Integrate-and-Fire (LIF) neuron model. The neuromorphic chips adopt different processing methods for different calculation tasks and data types, the access times of the memory module are increased, the data scheduling process is complicated, the access efficiency is not high, and information redundancy is possibly caused. Therefore, the traditional neuromorphic chip cannot efficiently schedule data of different types and/or different accuracies, has low access efficiency to the storage module, greatly limits the computational efficiency of the neuromorphic chip and the reasoning efficiency of the neural network, and has poor expandability.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

20页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:设备控制方法、装置、设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!