Integrated circuit device, electronic equipment, board card and calculation method

文档序号:1963661 发布日期:2021-12-14 浏览:14次 中文

阅读说明:本技术 集成电路装置、电子设备、板卡和计算方法 (Integrated circuit device, electronic equipment, board card and calculation method ) 是由 不公告发明人 于 2021-09-10 设计创作,主要内容包括:本公开涉及一种集成电路装置、电子设备、板卡和使用前述集成电路装置来执行计算的方法。该集成电路装置可以包括在组合处理装置的计算装置中,该计算装置可以包括一个或多个集成电路装置。前述的组合处理装置还可以包括接口装置和处理装置。所述计算装置与处理装置进行交互,共同完成用户指定的计算操作。组合处理装置还可以包括存储装置,该存储装置分别与计算装置和处理装置连接,用于存储该计算装置和处理装置的数据。本公开的方案可以降低内部设备与外部存储装置之间的数据传输量,由此最大程度地减少了由于带宽限制而导致的I/O瓶颈问题,从而可以提高集成电路装置的整体性能。(The present disclosure relates to an integrated circuit device, an electronic apparatus, a board, and a method of performing a calculation using the foregoing integrated circuit device. The integrated circuit device may be included in a computing device that combines processing devices, which may include one or more integrated circuit devices. The aforementioned combined processing means may further comprise interface means and processing means. The computing device interacts with the processing device to jointly complete computing operation designated by the user. The combined processing device may further comprise a storage device connected to the computing device and the processing device, respectively, for storing data of the computing device and the processing device. The scheme of the disclosure can reduce the data transmission quantity between the internal device and the external storage device, thereby reducing the I/O bottleneck problem caused by bandwidth limitation to the maximum extent, and improving the overall performance of the integrated circuit device.)

1. An integrated circuit device, comprising:

a data interface configured to transfer data between the integrated circuit device and an external memory; and

a multi-stage computing unit comprising a first stage computing unit and a multi-stage downstream computing unit cascaded in sequence, and wherein each stage computing unit is configured to:

receiving first data, wherein the first-stage computing unit is configured to receive the first data from the external memory via the data interface, each stage of a multi-stage downstream computing unit is configured to receive the first data from a previous-stage computing unit, and other computing units of the multi-stage computing unit except a last-stage computing unit are also configured to send the first data to a next-stage computing unit; and

and calculating according to the first data and the second data stored in advance to obtain a calculation result.

2. The integrated circuit device of claim 1, wherein each stage of computational cells comprises a master computational cell and a plurality of slave computational cells, wherein:

the master computing unit is configured to:

receiving first data and sending the first data to a corresponding plurality of slave computing units, wherein a master computing unit of a first-stage computing unit receives the first data from the external memory via the data interface, a master computing unit of each stage of downstream computing units receives the first data from a previous-stage computing unit, and master computing units of other computing units are further configured to send the first data to a next-stage computing unit; and

receiving intermediate results from a corresponding plurality of slave computing units and computing the computing results from a plurality of the intermediate results,

each slave computing unit is configured to:

calculating according to the first data and second data stored in advance to obtain an intermediate result; and

and sending the intermediate result to the main computing unit.

3. The integrated circuit device of claim 2, wherein each of the plurality of slave computing units comprises a master computing subunit and a plurality of slave computing subunits, wherein:

the main computing subunit is configured to:

receiving the first data from the corresponding master computing unit and sending the first data to the plurality of slave computing subunits; and

receiving intermediate sub-results from a plurality of slave computing sub-units, computing the intermediate results according to the intermediate sub-results, and sending the intermediate results to corresponding master computing units, wherein the corresponding master computing units are master computing units sending the first data to the master computing sub-units;

each slave computing subunit is configured to:

calculating according to the first data and second data stored in advance to obtain the intermediate sub-result; and

and sending the intermediate sub-result to the main computing subunit.

4. The integrated circuit device according to claim 3, wherein each of the plurality of main computing subunits is further configured to send the intermediate result to the external memory for storage thereof.

5. The integrated circuit device according to any of claims 3 or 4, wherein the master computing subunit is further configured to broadcast the first data in accordance with an output channel of the first data to cause the plurality of slave computing subunits to obtain the first data.

6. The integrated circuit device according to any of claims 3 or 4, wherein the master computing subunit is further configured to send or broadcast the first data to the plurality of slave computing subunits in respective dimensions of outputting the first data to cause the plurality of slave computing subunits to obtain the first data.

7. The integrated circuit device of any of claims 2-4, wherein each stage of the multi-stage compute unit further comprises:

a control unit configured to control information interaction between the master computing unit and the plurality of slave computing units in the level computing unit.

8. An electronic device comprising an integrated circuit arrangement according to any of claims 1-7.

9. A board card comprising an integrated circuit device according to any of claims 1-7.

10. A method of performing a computation using an integrated circuit device, wherein the integrated circuit device comprises a data interface for transferring data between the integrated circuit device and an external memory, and a plurality of stages of computation units, wherein the plurality of stages of computation units comprise a first stage of computation units and a plurality of stages of downstream computation units cascaded in sequence, the method comprising performing the following at each stage of computation units:

receiving first data, wherein the first-stage computing unit receives the first data from the external memory via a data interface, each stage of a multi-stage downstream computing unit receives the first data from a previous-stage computing unit, and other computing units except a last-stage computing unit of the multi-stage computing unit also send the first data to a next-stage computing unit; and

and calculating according to the first data and the second data stored in advance to obtain a calculation result.

Technical Field

The present disclosure relates generally to the field of data processing. More particularly, the present disclosure relates to an integrated circuit device, an electronic apparatus, a board, and a calculation method.

Background

With the development of the field of artificial intelligence, the operation data volume related to the large-scale neural network is larger and larger, and the requirement on the storage volume is higher and higher, for example, operations such as convolution weight gradient in back propagation and the like. In the conventional operation method, an operation is generally performed by a processor such as a central processing unit ("CPU") or a graphic processing unit ("GPU"). However, even if the parallel computing method is adopted, since the processor is limited by the capacity of the internal register resource, the huge amount of data computation may cause a large amount of data interaction between the processor and the external storage device, thereby reducing the computation and processing efficiency of the device. The parallel operation efficiency is greatly reduced because of the limited bandwidth of the input/output ("I/O") bus, which may present a serious I/O bottleneck problem. In addition, not only can the bandwidth limitations of the I/O bus become a performance bottleneck, but also the large amount of I/O access between the processor and the external storage device can cause large computational and power consumption overhead.

Disclosure of Invention

To solve at least the above-mentioned technical problem, the present disclosure provides a solution that can reduce the amount of data transfer with an external storage device, minimizing the I/O bottleneck problem caused by bus bandwidth limitation. In particular, the present disclosure provides the aforementioned solutions in a number of aspects as follows.

In a first aspect, the present disclosure provides an integrated circuit device comprising:

a data interface configured to transfer data between the integrated circuit device and an external memory; and

a multi-stage computing unit comprising a first stage computing unit and a multi-stage downstream computing unit cascaded in sequence, and wherein each stage computing unit is configured to:

receiving first data, wherein the first-stage computing unit is configured to receive the first data from the external memory via the data interface, each stage of a multi-stage downstream computing unit is configured to receive the first data from a previous-stage computing unit, and other computing units of the multi-stage computing unit except a last-stage computing unit are also configured to send the first data to a next-stage computing unit; and

and calculating according to the first data and the second data stored in advance to obtain a calculation result.

In a second aspect, the present disclosure provides an electronic device comprising the integrated circuit arrangement of the aforementioned and later described embodiments.

In a third aspect, the present disclosure provides a board card comprising an integrated circuit device according to the various embodiments described before and later on.

In a third aspect, the present disclosure provides a method of performing a computation using an integrated circuit device, wherein the integrated circuit device comprises a data interface for transferring data between the integrated circuit device and an external memory, and a multi-stage computation unit, wherein the multi-stage computation unit comprises a first stage of computation units and a plurality of stages of downstream computation units cascaded in sequence, the method comprising performing at each stage of computation units:

receiving first data, wherein the first-stage computing unit receives the first data from the external memory via a data interface, each stage of a multi-stage downstream computing unit receives the first data from a previous-stage computing unit, and other computing units except a last-stage computing unit of the multi-stage computing unit also send the first data to a next-stage computing unit; and

and calculating according to the first data and the second data stored in advance to obtain a calculation result.

By utilizing the integrated circuit device, the electronic equipment, the board card and the calculation method disclosed by the invention, internal resources can be fully utilized, and data transmission can be realized among the multi-stage calculation units, so that the I/O data transmission quantity with an external memory can be reduced. In addition, by significantly reducing data interaction with an external memory, the scheme of the disclosure can also improve the execution efficiency of operation, and reduce the bottleneck problem of operation performance caused by the limitation of I/O bandwidth, thereby improving the overall performance of the integrated circuit device, the electronic device or the board card.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

fig. 1 is a block diagram illustrating a board card according to an embodiment of the present disclosure;

FIG. 2 is a block diagram illustrating an integrated circuit device according to an embodiment of the disclosure;

FIG. 3 is a schematic diagram illustrating an internal architecture of a single core computing device, according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating an internal architecture of a multi-core computing device according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram showing the internal structure of a processor core according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram illustrating a structure of an integrated circuit device according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram illustrating data transfer of an integrated circuit device according to an embodiment of the disclosure;

FIG. 8 is a schematic block diagram illustrating a computing unit in accordance with an embodiment of the present disclosure;

FIG. 9 is a schematic architectural diagram illustrating a slave computing unit in accordance with an embodiment of the present disclosure;

FIG. 10 is a schematic block diagram illustrating another computing unit in accordance with embodiments of the present disclosure;

FIG. 11 is a simplified flow diagram illustrating a method of performing a calculation using an integrated circuit device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed in the present disclosure without making any creative effort, shall fall within the protection scope of the solution disclosed in the present disclosure.

Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic structural diagram illustrating a board card 10 according to an embodiment of the disclosure. It should be understood that the configuration and composition shown in FIG. 1 is merely an example, and is not intended to limit the aspects of the present disclosure in any way.

As shown in fig. 1, board 10 includes a Chip 101, which may be a System on Chip (SoC), i.e., a System on Chip as described in the context of the present disclosure. In one implementation scenario, it may be integrated with one or more combined processing devices. The combined processing device can be an artificial intelligence operation unit, is used for supporting various deep learning and machine learning algorithms, meets the intelligent processing requirements in complex scenes in the fields of computer vision, voice, natural language processing, data mining and the like, and particularly applies deep learning technology to the field of cloud intelligence in a large quantity. One of the significant characteristics of cloud-based intelligent application is that the input data size is large, and the requirements on the storage capacity and the computing capacity of the platform are high, whereas the board card 10 of the embodiment is suitable for cloud-based intelligent application and has huge off-chip storage, on-chip storage and strong computing capacity.

As further shown in the figure, the chip 101 is connected to an external device 103 through an external interface device 102. The external device 103 may be, for example, a server, a computer, a camera, a display, a mouse, a keyboard, a network card, a wifi interface, or the like, according to different application scenarios. The data to be processed may be transferred by the external device 103 to the chip 101 through the external interface device 102. The calculation result of the chip 101 may be transmitted back to the external device 103 via the external interface device 102. The external interface device 102 may have different interface forms, such as a PCIe interface, according to different application scenarios.

The card 10 may also include a memory device 104 for storing data, including one or more memory cells 105. The memory device 104 is connected and data-transferred with the control device 106 and the chip 101 through a bus. The control device 106 in the board 10 may be configured to regulate the state of the chip 101. For this purpose, in an application scenario, the control device 106 may include a single chip Microcomputer (MCU).

Fig. 2 is a structural diagram showing a combined processing device in the chip 101 according to the above-described embodiment. As shown in fig. 2, the combined processing device 20 may include a computing device 201, an interface device 202, a processing device 203, and a Dynamic Random Access Memory (DRAM) DRAM 204.

The computing device 201 may be configured to perform user-specified operations, primarily implemented as a single-core intelligent processor or a multi-core intelligent processor. In some operations, it may be used to perform calculations in terms of deep learning or machine learning, and may also interact with the processing means 203 through the interface means 202 to collectively complete the user-specified operations. In aspects of the present disclosure, the computing device may be configured to perform various tasks of the optimized neural network model, such as various operations that will be described later in the present disclosure.

The interface device 202 may be used to transfer data and control instructions between the computing device 201 and the processing device 203. For example, the computing device 201 may obtain input data from the processing device 203 via the interface device 202, and write to a storage device on the computing device 201. Further, the computing device 201 may obtain the control instruction from the processing device 203 via the interface device 202, and write the control instruction into a control cache on the computing device 201. Alternatively or optionally, the interface device 202 may also read data from a storage device of the computing device 201 and transmit the data to the processing device 203.

The processing device 203, as a general purpose processing device, performs basic control including, but not limited to, data transfer, starting and/or stopping of the computing device 201, and the like. Depending on the implementation, the Processing device 203 may be one or more types of Central Processing Unit (CPU), Graphics Processing Unit (GPU) or other general purpose and/or special purpose Processor, including but not limited to a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware components, etc., and the number thereof may be determined according to actual needs. As previously mentioned, the computing device 201 of the present disclosure may be viewed as having a single core structure or an isomorphic multi-core structure only. However, when considered collectively, the computing device 201 and the processing device 203 are considered to form a heterogeneous multi-core structure. According to aspects of the present disclosure, when implemented as a general-purpose processor, the processing device 203 may perform a compilation operation for optimizing the neural network model in order to compile the neural network model into a sequence of binary instructions executable by the computing device.

The DRAM 204 is used for storing data to be processed, and is a DDR memory, which is typically 16G or larger in size and is used for storing data of the computing device 201 and/or the processing device 203.

Fig. 3 shows an internal structure diagram of the computing apparatus 201 as a single core. The single-core computing device 301 is used for processing input data such as computer vision, voice, natural language, data mining, and the like, and the single-core computing device 301 includes three modules: a control module 31, an operation module 32 and a storage module 33.

The control module 31 is used for coordinating and controlling the operations of the operation module 32 and the storage module 33 to complete the task of deep learning, and includes an Instruction Fetch Unit (IFU) 311 and an Instruction Decode Unit (IDU) 312. The instruction fetch unit 311 is used for obtaining an instruction from the processing device 203, and the instruction decode unit 312 decodes the obtained instruction and sends the decoded result to the operation module 32 and the storage module 33 as control information.

The operation module 32 includes a vector operation unit 321 and a matrix operation unit 322. The vector operation unit 321 is used for performing vector operations, and can support complex operations such as vector multiplication, addition, nonlinear transformation, and the like; the matrix operation unit 322 is responsible for the core calculation of the deep learning algorithm, i.e., matrix multiplication and convolution. The storage module 33 is used to store or transport related data, and includes a Neuron storage unit (Neuron RAM, NRAM)331, a parameter storage unit (Weight RAM, WRAM)332, and a Direct Memory Access (DMA) 333. NRAM 331 is used to store input neurons, output neurons, and intermediate results after computation; WRAM 332 is used to store the convolution kernel of the deep learning network, i.e. the weight; the DMA 333 is connected to the DRAM 204 via the bus 34, and is responsible for data transfer between the single-core computing device 301 and the DRAM 204.

Fig. 4 shows a schematic diagram of the internal structure of the computing apparatus 201 with multiple cores. The multi-core computing device 41 is designed in a hierarchical structure, with the multi-core computing device 41 being a system on a chip that includes at least one cluster (cluster) according to the present disclosure, each cluster in turn including a plurality of processor cores. In other words, the multi-core computing device 41 is constructed in a system-on-chip-cluster-processor core hierarchy. In a system-on-chip hierarchy, as shown in FIG. 4, the multi-core computing device 41 includes an external storage controller 401, a peripheral communication module 402, an on-chip interconnect module 403, a synchronization module 404, and a plurality of clusters 405.

There may be multiple (2 as shown in the figure for example) external memory controllers 401, which are used to respond to the access request issued by the processor core and access the external memory device, i.e. the off-chip memory (e.g. DRAM 204 in fig. 2) in the context of this disclosure, so as to read data from or write data to the off-chip. The peripheral communication module 402 is used for receiving a control signal from the processing device 203 through the interface device 202 and starting the computing device 201 to execute a task. The on-chip interconnect module 403 connects the external memory controller 401, the peripheral communication module 402 and the plurality of clusters 405 for transmitting data and control signals between the respective modules. The synchronization module 404 is a Global synchronization Barrier Controller (GBC) for coordinating the operation progress of each cluster and ensuring the synchronization of information. The plurality of clusters 405 of the present disclosure are the computing cores of the multi-core computing device 41. Although 4 clusters are exemplarily shown in fig. 4, as hardware evolves, the multi-core computing device 41 of the present disclosure may also include 8, 16, 64, or even more clusters 405. In one application scenario, the cluster 405 may be used to efficiently execute a deep learning algorithm.

Looking at the cluster hierarchy, as shown in fig. 4, each cluster 405 may include a plurality of processor cores (IPU core)406 and a memory core (MEM core)407, which may include, for example, a cache memory (e.g., LLC) as described in the context of the present disclosure.

The processor cores 406 are exemplarily shown as 4 in the figure, the present disclosure does not limit the number of the processor cores 406, and the internal architecture thereof is as shown in fig. 5. Each processor core 406 is similar to the single core computing device 301 of fig. 3, and as such may include three modules: a control module 51 (including an instruction fetch unit 511 and an instruction decode unit 512), an operation module 52 (including a vector operation unit 521 and a matrix operation unit 522), and a storage module 53 (including NRAM 531, WRAM 532, IODMA 533, and MVDMA 534). The functions and structures of the control module 51, the operation module 52 and the storage module 53 are substantially the same as those of the control module 31, the operation module 32 and the storage module 33, and are not described herein again. It should be particularly noted that the storage module 53 may include an Input/Output Direct Memory Access (IODMA) module 533 and a transport Direct Memory Access (MVDMA) module 534. IODMA 533 controls access of NRAM 531/WRAM 532 and DRAM 204 through broadcast bus 409; the MVDMA 534 is used to control access to the NRAM 531/WRAM 532 and the memory cell (SRAM) 408.

Returning to FIG. 4, the storage core 407 is primarily used to store and communicate, i.e., store shared data or intermediate results among the processor cores 406, as well as perform communications between the cluster 405 and the DRAM 204, communications among each other cluster 405, communications among each other processor cores 406, and the like. In other embodiments, the memory core 407 may have the capability of scalar operations to perform scalar operations.

The Memory core 407 may include a Static Random-Access Memory (SRAM)408, a broadcast bus 409, a Cluster Direct Memory Access (CDMA) 410, and a Global Direct Memory Access (GDMA) 411. In one implementation scenario, SRAM 408 may assume the role of a high performance data transfer station. Thus, data multiplexed between different processor cores 406 within the same cluster 405 need not be individually obtained by the processor cores 406 to the DRAM 204, but rather is relayed between the processor cores 406 via the SRAM 408. Further, the memory core 407 only needs to quickly distribute multiplexed data from the SRAM 408 to the plurality of processor cores 406, so that it is possible to improve inter-core communication efficiency and significantly reduce off-chip input/output access.

Broadcast bus 409, CDMA 410, and GDMA 411 are used to perform communication among processor cores 406, communication among cluster 405, and data transfer between cluster 405 and DRAM 204, respectively. As will be described separately below.

The broadcast bus 409 is used to complete high-speed communication among the processor cores 406 in the cluster 405, and the broadcast bus 409 of this embodiment supports inter-core communication modes including unicast, multicast and broadcast. Unicast refers to point-to-point (e.g., from a single processor core to a single processor core) data transfer, multicast is a communication that transfers a copy of data from SRAM 408 to a particular number of processor cores 406, and broadcast is a communication that transfers a copy of data from SRAM 408 to all processor cores 406, which is a special case of multicast.

CDMA 410 is used to control access to SRAM 408 between different clusters 405 within the same computing device 201. The GDMA 411 cooperates with the external memory controller 401 to control the access of the SRAM 408 of the cluster 405 to the DRAM 204 or to read data from the DRAM 204 into the SRAM 408. As can be seen from the foregoing, communication between DRAM 204 and NRAM 431 or WRAM 432 may be achieved via 2 ways. The first way is to communicate with the NRAM 431 or WRAM 432 directly with the DRAM 204 through the IODAM 433; the second way is to transmit data between the DRAM 204 and the SRAM 408 through the GDMA 411, and transmit data between the SRAM 408 and the NRAM 431 or WRAM 432 through the MVDMA 534. Although the second approach may require more components and longer data flow, in some embodiments, the bandwidth of the second approach is substantially greater than that of the first approach, and thus it may be more efficient to perform communication between DRAM 204 and NRAM 431 or WRAM 432 in the second approach. It is understood that the data transmission schemes described herein are merely exemplary, and those skilled in the art can flexibly select and adapt various data transmission schemes according to the specific arrangement of hardware in light of the teachings of the present disclosure.

In other embodiments, the functionality of GDMA 411 and the functionality of IODMA 533 may be integrated in the same component. Although the present disclosure considers GDMA 411 and IODMA 533 as different components for convenience of description, it will be within the scope of protection of the present disclosure for a person skilled in the art as long as the achieved functions and technical effects are similar to the present disclosure. Further, the functions of GDMA 411, IODMA 533, CDMA 410, and MVDMA 534 may be implemented by the same component.

From the above description in connection with the embodiments, those skilled in the art will understand that the present disclosure also discloses an electronic device or apparatus, which may include one or more of the above boards, one or more of the above chips and/or one or more of the above combined processing devices.

According to different application scenarios, the electronic device or apparatus of the present disclosure may include a server, a cloud server, a server cluster, a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a PC device, an internet of things terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a visual terminal, an autopilot terminal, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph. The electronic device or apparatus of the present disclosure may also be applied to the fields of the internet, the internet of things, data centers, energy, transportation, public management, manufacturing, education, power grid, telecommunications, finance, retail, construction site, medical, and the like.

Further, the electronic device or apparatus of the present disclosure may also be used in application scenarios related to artificial intelligence, big data, and/or cloud computing, such as a cloud, an edge, and a terminal. In one or more embodiments, the electronic device or apparatus according to the present disclosure may be applied to a cloud device (e.g., a cloud server), and the electronic device or apparatus with low power consumption may be applied to a terminal device and/or an edge device (e.g., a smartphone or a camera). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that appropriate hardware resources can be matched from the hardware resources of the cloud device to simulate the hardware resources of the terminal device and/or the edge device according to the hardware information of the terminal device and/or the edge device, and uniform management, scheduling and cooperative work of end-cloud integration or cloud-edge-end integration can be completed.

The hardware architecture and its internal structure of the present disclosure are described in detail above in conjunction with fig. 1-5. It is to be understood that the above description is intended to be illustrative, and not restrictive. According to different application scenarios and hardware specifications, those skilled in the art may also change the board card and the internal structure of the present disclosure, and these changes still fall into the protection scope of the present disclosure. The scheme of the present disclosure will be described in detail below.

Fig. 6 is a schematic diagram illustrating a structure of an integrated circuit device 600 according to an embodiment of the disclosure. To illustrate the relationship between the integrated circuit device 600 and the external memory, an external memory 605 is also shown in FIG. 6. An integrated circuit device 600 as shown in fig. 6 may comprise a data interface 601 and a multi-stage computational unit, which may for example comprise a computational unit 602, a computational unit 603, 603 …, computational units 604, which are cascaded in sequence as shown in fig. 6. The data interface 601 may be configured to transfer data between the integrated circuit device and the external memory 605. In one application scenario, a Direct Memory Access ("DMA") interface may be used as the aforementioned data interface 601 to transmit data of the external Memory 605 to a first-level computing unit (e.g., computing unit 602 explicitly shown in fig. 6) in a multi-level computing unit.

In one embodiment, the multi-stage computing units may include a first-stage computing unit (e.g., computing unit 602 in fig. 6) and a plurality of stages of downstream computing units (e.g., computing unit 603 … computing unit 604 in fig. 6). Based on the requirements of different application scenarios, the number of stages of the computing unit may be any positive integer greater than or equal to 2, such as 2 stages, 3 stages, or 5 stages, so as to fully utilize internal resources to perform an operation, thereby significantly improving the efficiency of the operation.

Based on the above structure, each stage of the calculation unit may be configured to receive first data, and perform calculation based on the first data and second data stored in advance to obtain a calculation result. In the embodiment shown in fig. 6, the first level computing unit 602 may be configured to receive the first data from the external memory 605 via the data interface 601. Each stage of the multi-stage downstream computing unit may be configured to receive the first data from the previous stage of the computing unit, e.g., computing unit 603 may receive the first data from computing unit 602, computing unit 604 may receive the first data from its previous stage of the downstream computing unit, etc.

Correspondingly, the other computing units except the last computing unit 604 in the multi-stage computing unit may be further configured to send the first data to the next computing unit, so that the next computing unit performs corresponding computation according to the first data and the pre-stored second data. For example, the calculation unit 602 may transmit the first data to the calculation unit 603, and the calculation unit 603 may transmit the first data to the calculation unit 604.

In one implementation scenario, each two adjacent computing units may be communicatively connected through two DMA buses, and the two DMA buses may be full-duplex communication, and the bandwidth may reach 128GB/s, for example.

The first data mentioned above may for example comprise neuron data in a neural network and the second data may for example comprise weight data in the neural network. Further, the first data and the second data may be vector, matrix, multidimensional (three-dimensional or four-dimensional and above) data. The first data and/or the second data of the present disclosure may each include one or more data blocks according to different data structures (e.g., different data placement rules). In one operational scenario, when the integrated circuit device of the present disclosure is used for matrix computation, the first data and/or the second data may also be a block of a certain size in the matrix for block-wise parallel computation of the matrix.

In an implementation scenario, each of the computing units of the above-mentioned stages may perform a corresponding computing task according to the first data and the pre-stored second data to obtain a computing result based on requirements of different application scenarios. The calculation task may be, for example, one or any combination of convolution operation, matrix multiplication matrix operation, matrix multiplication vector operation, bias operation, full join operation, GEMM operation, GEMV operation, and activation operation, which are involved in the artificial intelligence field (e.g., neural network).

To better understand how the integrated circuit device of the present disclosure performs operations, the data calculation method of the embodiment of the present disclosure is described below by taking the integrated circuit device 700 shown in fig. 7 as an example. For simplicity of description, the integrated circuit device 700 is shown in FIG. 7 as including only three levels of computational cells for illustrative purposes. As shown in fig. 7, the three-stage calculation unit includes a calculation unit 702, a calculation unit 703, and a calculation unit 704, which are sequentially cascaded. In the three stages of computing units, the computing unit 702 is a first stage computing unit, and the computing unit 703 and the computing unit 704 are downstream computing units.

Assume that the integrated circuit device 700 performs the correlation calculation of the neural network, and that the input data of the neural network is ina, which is stored in an external memory such as the external memory 705. The calculation unit 702 stores the second data b1 in advance, the calculation unit 703 stores the second data b2 in advance, and the calculation unit 704 stores the second data b3 in advance. In an initial stage of performing the operation, the external memory 705 may transmit the input data ina to the calculation unit 702 via the data interface 701 as the first data received by the calculation unit 702. After obtaining the first data ina, the computing unit 702 may perform a corresponding computing task (e.g., convolution operation) according to the first data ina and the second data b1 in the computing unit 702, so as to obtain a computing result c 1. In addition, in order to facilitate the calculation by the calculation unit 703, the calculation unit 702 also sends the first data ina to the calculation unit 703.

After obtaining the first data ina, the computing unit 703 performs a corresponding computing task (e.g., matrix-by-matrix operation) according to the first data ina and the second data b2 in the computing unit 703, similar to the computing method of the computing unit 702, to obtain a computing result c 2. In addition, the computing unit 703 also sends the first data ina to the computing unit 704, so that it performs corresponding computation (e.g. GEMM operation) according to the first data ina and the second data b3 therein, resulting in a computing result c 3. It will be understood by those skilled in the art that the computing tasks performed in each of the computing units described above are only exemplary and not limiting, and other computing tasks listed above may also be performed based on different application scenarios, and will not be described herein again.

The architecture and functionality of the integrated circuit device of the present disclosure is described above in connection with fig. 6 and 7. According to the scheme disclosed by the invention, the operation data can be sequentially transmitted by utilizing the multi-stage architecture of the multi-stage computing unit, so that the data throughput of the integrated circuit device for data interaction with the external memory through the I/O bus is obviously reduced, and the bottleneck problem of the bandwidth of the I/O bus is overcome. In addition, the architecture and the data transmission mode also fully utilize the high-speed bandwidth on the chip between the computing units, so that the operating efficiency of data transmission is improved.

In order to further reduce data exchange between the integrated circuit device and the external memory, thereby reducing the I/O bottleneck problem caused by the external bandwidth limitation, each stage of the computing unit may further comprise a master computing unit and a plurality of slave computing units. Based thereon, the master computing unit may be configured to receive first data and to send the first data to a corresponding plurality of slave computing units. The main computing unit of the first-stage computing unit receives first data from the external memory via the data interface, and the main computing unit of each stage of downstream computing unit receives the first data from the upper-stage computing unit. In addition, the main calculation unit of the aforementioned other calculation units (calculation units other than the last-stage calculation unit in the multi-stage calculation unit) may be further configured to transmit the first data to the next-stage calculation unit.

Further, the master computing unit may be further configured to receive intermediate results from a corresponding plurality of slave computing units, and to compute the computing result from a plurality of the intermediate results. Correspondingly, each of the plurality of slave computing units may be configured to perform a computation based on the first data and pre-stored second data to obtain an intermediate result, and to send the intermediate result to the master computing unit.

The architecture and function of the computing unit 800 shown in FIG. 8 are described below as an example. As can be seen from fig. 8, the computing unit 800 may include a master computing unit 801, a slave computing unit 802, a slave computing unit 803 …, and a slave computing unit 804. The number of slave computing units may be set to different numbers according to requirements of different application scenarios, for example, the number may be three, four, or seven, and the like, which is not limited by the embodiment of the present disclosure.

Based on this, the master computing unit 801 may be configured to receive first data (e.g., the aforementioned input data ina) and divide the first data ina into a plurality of first sub-data a11, a12 … a13 that match (e.g., are equal to) the number of slave computing units. Next, the master computing unit 801 transmits the first sub data a11, a12 … a13 to the slave computing unit 802, the slave computing unit 803 …, and the slave computing unit 804, respectively.

Regarding the source of the first data, assuming that the computing unit 800 is a first-level computing unit, it may receive the first data ina from the external memory via the data interface. Assuming that the computing unit 800 is a downstream computing unit, it may receive the first data ina from a higher-level computing unit.

Assume that the second data prestored in the calculation unit 800 is still b 1. Based on the above-described division of the first data ina, accordingly, the embodiment of the present disclosure may divide the second data b1 into a plurality of second sub data b11, b12 … b13 corresponding to (e.g., equal to) the number of slave computing units and store them in the corresponding slave computing units, respectively, so as to perform corresponding computation with the received first sub data. In the present embodiment, it is set that the second sub data b11 is stored in the slave computing unit 802, the second sub data b12 is stored in the slave computing unit 803 …, and the second sub data b13 is stored in the slave computing unit 804.

After the above operations of data division and transmission are performed, the slave computing unit 802 may perform corresponding computing subtasks according to the first sub-data a11 and the second sub-data b11, and obtain an intermediate result c 11. The slave computing unit 803 may perform corresponding computing subtasks according to the first sub data a12 and the second sub data b12, resulting in an intermediate result c 12. By analogy, the slave computing unit 804 may execute corresponding computing subtasks according to the first sub data a13 and the second sub data b13, and obtain an intermediate result c 13. After obtaining the plurality of intermediate results, the slave computing units 802, 803 … may send the corresponding intermediate results c11, c12 … c13 to the master computing unit 801 from the computing unit 804. Then, the main calculation unit 801 may calculate (e.g., sum) the calculation result c1 according to a plurality of the intermediate results c11, c12 … c 13.

When the calculation unit 800 is not the last stage of the multi-stage calculation unit of the integrated circuit device, the main calculation unit 801 may further send the first data ina to a next-stage calculation unit, so that the next-stage calculation unit performs calculation based on the first data ina and second data stored in advance thereof to obtain a calculation result.

According to the above description, since the intermediate result of each computing unit can be stored in the corresponding computing unit without being stored in the external memory, data exchange with the external memory can be reduced, thereby further reducing the I/O bottleneck caused by external bandwidth limitation. In addition, the computing task of the computing unit is divided into a plurality of computing sub-tasks, so that the data computing speed is increased, and the data processing efficiency is improved.

The above-described embodiments have described, by way of example only, the case where the first data and the second data are divided into the first sub data and the second sub data equal to the number of slave computing units. It will be appreciated by those skilled in the art that in some application scenarios, the first data and the second data may be divided in other ways, even without dividing the two data. In short, the present disclosure may also use only one of the plurality of slave computing units (e.g., the slave computing unit 802 in fig. 8) to perform the computation according to the first data and the second data to obtain the computation result of the computing unit.

The architecture and data computation flow of the computing unit are described above with reference to the drawings. In order to further reduce data exchange with the external memory, each of the plurality of slave computing units may also include a master computing subunit and a plurality of slave computing subunits. Based on this, the master computing subunit may be configured to receive the first data from the corresponding master computing unit and to send the first data to the plurality of slave computing subunits. In addition, the master computing subunit may be further configured to receive intermediate sub-results from a plurality of slave computing subunits, compute the intermediate results according to the plurality of intermediate sub-results, and send the intermediate results to the corresponding master computing unit. Wherein the corresponding primary computing unit is a primary computing unit that sends the first data to the primary computing subunit. Correspondingly, each slave computing subunit of the plurality of slave computing subunits may be configured to perform a computation according to the first data and pre-stored second data to obtain the intermediate sub-result, and send the intermediate sub-result to the master computing subunit.

Based on different application scenarios, the master computing subunit can transmit to the plurality of slave computing subunits according to a plurality of transmission paths. For example, in one implementation scenario, the master computing subunit may be configured to broadcast the first data according to an output channel of the first data, so that the plurality of slave computing subunits acquire the first data. In another implementation scenario, the master computing subunit may be further configured to send or broadcast the first data to the plurality of slave computing subunits in each dimension in which the first data is output, so that the plurality of slave computing subunits acquire the first data.

The present disclosure will now take the slave computing unit 900 shown in fig. 9 as an example to explain the architecture and function of the slave computing unit. As can be seen from fig. 9, the slave computing unit 900 may include a master computing subunit 901 and a plurality of slave computing subunits 902, 903 and 903 …. The number of slave computing subunits can be set to different numbers according to the requirements of different application scenarios, for example, the number can be two, four, or six, and the like, which is not limited by the embodiment of the present disclosure.

Based on this, the main computing subunit 901 may be configured to receive the first data (e.g., the aforementioned first sub-data a11) from the corresponding main computing unit, and divide the first sub-data a11 into a plurality of secondary sub-data a111, a112 … a113 that match (e.g., are equal to) the number of the secondary computing subunits in a data processing manner similar to the aforementioned computing units. Next, the master computing subunit 901 may send the plurality of secondary data a111, a112 … a113 to the plurality of slave computing subunits 902, 903 … slave computing subunits 904, respectively.

Assume that the second sub-data prestored from the computing unit 900 is b 11. Based on this, the scheme of the present disclosure may divide the second sub data b11 into a plurality of secondary sub data b111, b112, and b113 corresponding to (e.g., equal to) the number of slave calculation sub units, and store them in the corresponding slave calculation sub units, respectively. In the present embodiment, it is set that the secondary data b111 is stored in the secondary calculation subunit 902, the secondary data b112 is stored in the secondary calculation subunit 903 …, and the secondary data b113 is stored in the secondary calculation subunit 904.

After the above operations of data division and transmission are performed, the slave computing sub-unit 902 may perform a corresponding computing sub-task (a second-level computing sub-task of the computing sub-task performed by the slave computing unit 900) according to the second-level sub-data a111 and the second-level sub-data b111, and obtain an intermediate sub-result c 111. The slave computing sub-unit 903 may execute corresponding computing sub-tasks (the above-mentioned secondary computing sub-tasks of the computing sub-tasks executed by the slave computing unit 900) according to the secondary sub-data a112 and the secondary sub-data b112, and obtain an intermediate sub-result c 112. By analogy, the slave computing sub-unit 904 may execute a corresponding computing sub-task (the above-mentioned secondary computing sub-task of the computing sub-task executed by the slave computing unit 900) according to the secondary sub-data a113 and the secondary sub-data b113, and obtain the intermediate sub-result c 113. After obtaining the plurality of intermediate sub-results, the slave computing sub-unit 902 and the slave computing sub-unit 903 … may respectively send the corresponding intermediate sub-results c111, c112 … c113 to the master computing sub-unit 901. Then, the main calculating subunit 901 can calculate the intermediate result c11 according to the intermediate sub-results c111, c112 … c113 and send it to the corresponding main calculating unit. Assuming that the slave computing unit 900 in fig. 9 is the slave computing unit 802 in fig. 8, the master computing unit corresponding thereto may be the master computing unit 801 in fig. 8.

As can be seen from the above description, since the intermediate sub-result of each slave computing unit of the present disclosure can also be stored in the corresponding slave computing unit without being stored in the external memory, data exchange with the external memory can be further reduced, and further I/O bottleneck caused by external bandwidth limitation is reduced. In addition, the computing subtasks of the slave computing unit are further divided into a plurality of secondary computing subtasks, so that the data computing speed is further increased, and the data processing efficiency is further improved.

The above-described embodiments have described only the case where the first sub data and the second sub data are divided into the secondary sub data equal to the number of slave calculation sub units by way of example. Those skilled in the art will appreciate that, in a certain application scenario, the first sub-data and the second sub-data may be divided in other manners, even if the two data are not divided. In short, the present disclosure may also use only one of the plurality of slave computing subunits (e.g., slave computing subunit 902 in fig. 9) to perform the computation according to the first sub data and the second sub data to obtain the intermediate result of the computing unit.

In order to facilitate subsequent invocations and performing relevant calculations, in one embodiment each of the plurality of main computing subunits may be further configured to send the intermediate results to the external memory for storage.

Various information interactions of the computing unit are described above in connection with various embodiments. It will be appreciated by those skilled in the art that to achieve reliable interaction of the above information, each stage of the multi-stage computing unit may further comprise a control unit. The control unit may be configured to control information interaction between the master computing unit and the plurality of slave computing units in the level computing unit. A schematic architecture diagram of the computing unit 1000 is shown in fig. 10. As can be seen from this figure, the computing unit 1000 may comprise a master computing unit 1001, a plurality of slave computing units 1002, a slave computing unit 1003 … and a slave computing unit 1004. The computing unit 1000 further includes a control unit 1005, and the control unit 1005 may control the master computing unit 1001 to transmit the first data (or the first sub-data described in the foregoing embodiments) to the plurality of slave computing units 1002, 1003 …, and 1004. Accordingly, the control unit 1005 may also control the slave computing unit 1002, the slave computing unit 1003, 1003 …, and the like to feed back intermediate results to the master computing unit 1001 from the computing unit 1004, so that reliable information interaction between the master computing unit 1001 and the respective slave computing units can be achieved.

FIG. 11 is a simplified flow diagram illustrating a method of performing a calculation using an integrated circuit device according to an embodiment of the present disclosure. From the foregoing, it will be appreciated that the integrated circuit device herein may be the integrated circuit device described in connection with the foregoing embodiments, having the illustrated interconnections and supporting additional classes of operations.

As shown in fig. 11, the method 1100 may include receiving first data at step S1101. Wherein the first stage computing unit receives the first data from the external memory via the data interface, and each stage of the plurality of stages of downstream computing units receives the first data from the previous stage computing unit. In addition, the other calculation units except the last stage calculation unit in the multi-stage calculation unit may transmit the first data to the next stage calculation unit. Next, the method 1100 performs step S1102, and performs calculation according to the first data and the second data stored in advance to obtain a calculation result.

The calculation method of the present disclosure is described above only in conjunction with fig. 11 for the sake of simplicity. Those skilled in the art can also appreciate that the method may include more steps according to the disclosure of the present disclosure, and the execution of the steps may implement various operations of the present disclosure described in conjunction with the embodiments, and thus, the detailed description is omitted here.

It is noted that for the sake of brevity, this disclosure presents some methods and embodiments thereof as a series of acts or combinations thereof, but those skilled in the art will appreciate that the aspects of the disclosure are not limited by the order of the acts described. Accordingly, one of ordinary skill in the art will appreciate that certain steps may be performed in other sequences or simultaneously, in accordance with the disclosure or teachings of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in this disclosure are capable of being practiced in other than the specifically disclosed embodiments, and that the acts or modules illustrated herein are not necessarily required to practice one or more aspects of the disclosure. In addition, the present disclosure may focus on the description of some embodiments, depending on the solution. In view of the above, those skilled in the art will understand that portions of the disclosure that are not described in detail in one embodiment may also be referred to in the related description of other embodiments.

In particular implementation, based on the disclosure and teachings of the present disclosure, one skilled in the art will appreciate that several embodiments disclosed in the present disclosure may be implemented in other ways not disclosed herein. For example, as for the units in the foregoing embodiments of the electronic device or apparatus, the units are divided based on the logic functions, and there may be other dividing manners in actual implementation. Also for example, multiple units or components may be combined or integrated with another system or some features or functions in a unit or component may be selectively disabled. The connections discussed above in connection with the figures may be direct or indirect couplings between the units or components in terms of connectivity between the different units or components. In some scenarios, the aforementioned direct or indirect coupling involves a communication connection utilizing an interface, where the communication interface may support electrical, optical, acoustic, magnetic, or other forms of signal transmission.

In the present disclosure, units described as separate parts may or may not be physically separate, and parts shown as units may or may not be physical units. The aforementioned components or units may be co-located or distributed across multiple network elements. In addition, according to actual needs, part or all of the units can be selected to achieve the purpose of the scheme of the embodiment of the disclosure. In addition, in some scenarios, multiple units in embodiments of the present disclosure may be integrated into one unit or each unit may exist physically separately.

In some implementation scenarios, the integrated units may be implemented in the form of software program modules. If implemented in the form of software program modules and sold or used as a stand-alone product, the integrated units may be stored in a computer readable memory. In this regard, when aspects of the present disclosure are embodied in the form of a software product (e.g., a computer-readable storage medium), the software product may be stored in a memory, which may include instructions for causing a computer device (e.g., a personal computer, a server, or a network device, etc.) to perform some or all of the steps of the methods described in embodiments of the present disclosure. The Memory may include, but is not limited to, a usb disk, a flash disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

In other implementation scenarios, the integrated unit may also be implemented in hardware, that is, a specific hardware circuit, which may include a digital circuit and/or an analog circuit, etc. The physical implementation of the hardware structure of the circuit may include, but is not limited to, physical devices, which may include, but are not limited to, transistors or memristors, among other devices. In view of this, the various devices described herein (e.g., computing devices or processing devices) may be implemented by suitable hardware processors, such as CPUs, GPUs, FPGAs, DSPs, ASICs, and the like. Further, the aforementioned storage unit or storage device may be any suitable storage medium (including magnetic storage medium or magneto-optical storage medium, etc.), and may be, for example, a variable Resistive Memory (RRAM), a Dynamic Random Access Memory (DRAM), a Static Random Access Memory (SRAM), an Enhanced Dynamic Random Access Memory (EDRAM), a High Bandwidth Memory (HBM), a Hybrid Memory Cube (HMC), a ROM, a RAM, or the like.

The foregoing may be better understood in light of the following clauses:

clause 1, an integrated circuit device, comprising:

a data interface configured to transfer data between the integrated circuit device and an external memory; and

a multi-stage computing unit comprising a first stage computing unit and a multi-stage downstream computing unit cascaded in sequence, and wherein each stage computing unit is configured to:

receiving first data, wherein the first-stage computing unit is configured to receive the first data from the external memory via the data interface, stages of a plurality of stages of downstream computing units are configured to receive the first data from a previous-stage computing unit, and computing units other than a last-stage computing unit of the plurality of stages of computing units are configured to send the first data to a next-stage computing unit; and

and calculating according to the first data and the second data stored in advance to obtain a calculation result.

Clause 2, the integrated circuit device of clause 1, wherein each level of computational units comprises a master computational unit and a plurality of slave computational units, wherein:

the master computing unit is configured to:

receiving first data and sending the first data to a corresponding plurality of slave computing units, wherein a master computing unit of a first-stage computing unit receives the first data from the external memory via the data interface, a master computing unit of each stage of downstream computing units receives the first data from a previous-stage computing unit, and master computing units of other computing units are further configured to send the first data to a next-stage computing unit; and

receiving intermediate results from a corresponding plurality of slave computing units and computing the computing results from a plurality of the intermediate results,

each slave computing unit is configured to:

calculating according to the first data and second data stored in advance to obtain an intermediate result; and

and sending the intermediate result to the main computing unit.

Clause 3, the integrated circuit device of clause 2, wherein each of the plurality of slave computing units comprises a master computing subunit and a plurality of slave computing subunits, wherein:

the main computing subunit is configured to:

receiving the first data from the corresponding master computing unit and sending the first data to the plurality of slave computing subunits; and

receiving intermediate sub-results from a plurality of slave computing sub-units, computing the intermediate results according to the intermediate sub-results, and sending the intermediate results to corresponding master computing units, wherein the corresponding master computing units are master computing units sending the first data to the master computing sub-units;

each slave computing subunit is configured to:

calculating according to the first data and second data stored in advance to obtain the intermediate sub-result; and

and sending the intermediate sub-result to the main computing subunit.

Clause 4, the integrated circuit device of clause 3, wherein each of the plurality of main computing subunits is further configured to send the intermediate result to the external memory for storage thereof.

Clause 5, the integrated circuit device according to any one of clauses 3 or 4, wherein the master computing subunit is further configured to broadcast the first data according to the output channel of the first data, such that the plurality of slave computing subunits obtain the first data.

Clause 6, the integrated circuit device of any of clauses 3 or 4, wherein the master computing subunit is further configured to send or broadcast the first data to the plurality of slave computing subunits in respective dimensions for outputting the first data, such that the plurality of slave computing subunits obtain the first data.

Clause 7, the integrated circuit device of any of clauses 2-4, wherein each stage of the multi-stage computational unit further comprises:

a control unit configured to control information interaction between the master computing unit and the plurality of slave computing units in the level computing unit.

Clause 8, an electronic device, comprising the integrated circuit apparatus of any one of clauses 1-7.

Clause 9, a board card comprising the integrated circuit device of any one of clauses 1-7.

Clause 10, a method of performing a computation using an integrated circuit device, wherein the integrated circuit device comprises a data interface for communicating data between the integrated circuit device and an external memory, and a multi-stage computational unit, wherein the multi-stage computational unit comprises a first stage computational unit and a plurality of stages of downstream computational units cascaded in sequence, the method comprising performing the following at each stage of computational units:

receiving first data, wherein the first-stage computing unit receives the first data from the external memory via a data interface, each stage of a multi-stage downstream computing unit receives the first data from a previous-stage computing unit, and other computing units except a last-stage computing unit of the multi-stage computing unit also send the first data to a next-stage computing unit; and

and calculating according to the first data and the second data stored in advance to obtain a calculation result.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, description, and drawings of the present disclosure are used to distinguish between different objects and are not used to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

While various embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous modifications, changes, and substitutions will occur to those skilled in the art without departing from the spirit and scope of the present disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. It is intended that the following claims define the scope of the disclosure and that equivalents or alternatives within the scope of these claims be covered thereby.

23页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种自适应TTL或RS232的通讯电路

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!