Content addressable memory with spin orbit torque device
阅读说明:本技术 具有自旋轨道转矩设备的内容可寻址存储器 (Content addressable memory with spin orbit torque device ) 是由 W·H·崔 J·金 于 2019-06-03 设计创作,主要内容包括:本发明题为“具有自旋轨道转矩设备的内容可寻址存储器”。本文提供了三态内容可寻址存储器(TCAM)电路。在一个示例性实施方式中,TCAM电路可以包括第一自旋轨道转矩(SOT)磁隧道结(MTJ)元件,该第一自旋轨道转矩(SOT)磁隧道结(MTJ)元件具有耦接到由第一搜索线控制的第一读取晶体管的钉扎层,并且具有跨补充写入输入以第一配置耦接的自旋霍尔效应(SHE)层。TCAM电路可以包括第二SOT MTJ元件,该第二SOT MTJ元件具有耦接到由第二搜索线控制的第二读取晶体管的钉扎层,并且具有跨补充写入输入以第二配置耦接的SHE层。TCAM电路可以包括:偏置晶体管,该偏置晶体管被配置成向第一读取晶体管和第二读取晶体管的漏极端子提供偏置电压;以及电压保持器元件,该电压保持器元件将漏极端子耦接到匹配指示符线。(The invention provides a content addressable memory with spin orbit torque device. Ternary Content Addressable Memory (TCAM) circuits are provided herein. In one exemplary embodiment, a TCAM circuit may include a first Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) element having a pinned layer coupled to a first read transistor controlled by a first search line and having a Spin Hall Effect (SHE) layer coupled in a first configuration across a supplemental write input. The TCAM circuit may include a second SOT MTJ element having a pinned layer coupled to a second read transistor controlled by a second search line and having a SHE layer coupled in a second configuration across a supplemental write input. The TCAM circuit may include: a bias transistor configured to provide a bias voltage to drain terminals of the first read transistor and the second read transistor; and a voltage keeper element coupling the drain terminal to the match indicator line.)
1. A Content Addressable Memory (CAM) circuit, comprising:
a first Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) element having a pinned layer coupled to a first read transistor controlled by a first search line and having a Spin Hall Effect (SHE) layer coupled in a first configuration across a supplemental write input;
a second SOT MTJ element having a pinned layer coupled to a second read transistor controlled by a second search line and having a SHE layer coupled in a second configuration across the supplemental write input;
a bias transistor configured to connect a bias voltage to drain terminals of the first read transistor and the second read transistor; and
a voltage keeper element coupling the drain terminal to a match indicator line.
2. The CAM circuit of claim 1, comprising:
the SHE layer of the first SOT MTJ element, the SHE layer coupled in the first configuration across the supplemental write input by a first write control transistor controlled by a first write control line; and
a SHE layer of the second SOT MTJ element, the SHE layer coupled in the second configuration across the supplemental write input by a second write control transistor controlled by a second write control line.
3. The CAM circuit of claim 2, comprising:
a control circuit configured to write data into the first SOT MTJ element by at least enabling the first write control transistor to establish a first current through the SHE layer of the first SOT MTJ element, the first current changing a magnetization state of the first SOT MTJ element according to the data; and
the control circuit is configured to write data into the second SOT MTJ element by at least enabling the second write control transistor to establish a second current through the SHE layer of the second SOT MTJ element that changes a magnetization state of the second SOT MTJ element according to a complementary version of the data.
4. The CAM circuit of claim 1, comprising:
a precharge element coupling the match indicator line to a predetermined voltage according to a precharge control signal.
5. The CAM circuit of claim 1, comprising:
a control circuit configured to precharge the match indicator line to a predetermined voltage and disable the first read transistor and the second read transistor; and
after precharging the match indicator line, the control circuit is configured to evaluate a data match state between the first SOT MTJ element and the second SOT MTJ element by enabling at least the first read transistor and the second read transistor when supplemental search data is present on the first search line and the second search line to responsively output a match result voltage on the match indicator line representative of the data match state.
6. The CAM circuit of claim 5, wherein the match result voltage comprises the predetermined voltage when a match is determined between the search data and data stored in the first SOT MTJ and the second SOT MTJ, wherein the match result voltage comprises a voltage level lower than the predetermined voltage when a mismatch is determined between the search data and data stored in the first SOT MTJ and the second SOT MTJ.
7. The CAM circuit of claim 1, comprising:
a separate write path and read path arrangement, wherein the write path includes the supplemental write input coupled to the SHE layer of the first SOT MTJ and the SHE layer of the second SOT MTJ, and wherein the read path includes the first read transistor controlled by the first search line and the second read transistor controlled by the second search line.
8. The CAM circuit of claim 1, wherein the SHE layer of the first SOT MTJ element and the SHE layer of the second SOT MTJ element each comprise a Spin Hall Metal (SHM) material comprising one of beta (β) -tungsten and β -tantalum.
9. The CAM circuit of claim 1, wherein the first search line and the second search line accept tri-state inputs.
10. A Ternary Content Addressable Memory (TCAM) array, comprising:
a plurality of TCAM cells arranged in columns and rows, wherein the TCAM cells of each column are coupled via an associated search line and the TCAM cells of each row are coupled via an associated match line;
each of the plurality of TCAM units includes:
a first Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) element having a pinned layer coupled to a first search control transistor controlled by a first search line and having a Spin Hall Effect (SHE) layer coupled in a first configuration across a supplemental write input;
a second SOT MTJ element having a pinned layer coupled to a second search control transistor controlled by a second search line and having a SHE layer coupled in a second configuration across the supplemental write input;
a bias transistor configured to provide a bias voltage to drain terminals of the first search control transistor and the second search control transistor; and
a voltage keeper element coupling the drain terminal to a corresponding match line.
11. The TCAM array of claim 10, each of the plurality of TCAM cells further comprising:
the SHE layer of the first SOT MTJ element, the SHE layer coupled to the supplemental write input in the first configuration through a first write control transistor controlled by a first write control line; and
the SHE layer of the second SOT MTJ element coupled to the supplemental write input in the second configuration through a second write control transistor controlled by a second write control line.
12. The TCAM array of claim 11, comprising:
a control circuit configured to write data to the TCAM cell by at least:
enabling a first write control transistor of the TCAM cell to establish a first current through the SHE layer of the first SOT MTJ element that changes a magnetization state of the TCAM cell according to the data; and
enabling a second write control transistor of the TCAM cell to establish a second current through the SHE layer of the second SOT MTJ element that changes a magnetization state of the TCAM cell according to a complementary version of the data.
13. The TCAM array of claim 10, each of the plurality of TCAM cells further comprising:
a precharge element to couple the corresponding match line to a predetermined voltage in response to a precharge control signal.
14. The TCAM array of claim 10, comprising:
a control circuit configured to search for data in the TCAM cell by at least:
after precharging the associated match line to a predetermined voltage, when search data is present on the associated search line, evaluating a data match status between the TCAM cells by enabling at least the first read transistor and the second read transistor of the TCAM cells to responsively output a match result voltage on the associated match indicator line representative of the data match status of the TCAM cells.
15. The TCAM array of claim 14, wherein the match result voltage for each row includes the predetermined voltage when a match is determined between the search data and data stored in TCAM cells searched in each row, and wherein the match result voltage for each row includes a voltage level below the predetermined voltage when a mismatch is determined between the search data and data stored in TCAM cells searched in each row.
16. The TCAM array of claim 10, each of the plurality of TCAM cells further comprising:
a separate write path and read path arrangement, wherein the write path includes the supplemental write input coupled to the SHE layer of the first SOT MTJ and the SHE layer of the second SOT MTJ, and wherein the read path includes the first read transistor controlled by the first search line and the second read transistor controlled by the second search line.
17. The TCAM array of claim 10, wherein the SHE layer of the first SOT MTJ element and the SHE layer of the second SOT MTJ element each comprise a Spin Hall Metal (SHM) material comprising one of beta (β) -tungsten and β -tantalum.
18. A method of operating a Ternary Content Addressable Memory (TCAM) cell, comprising:
writing data in a first spin-orbit-torque (SOT) Magnetic Tunnel Junction (MTJ) element by establishing a first current through a first write control transistor and a SHE layer of the first SOT MTJ element, the first current changing a magnetization state of the first SOT MTJ element in accordance with the data; and
writing data in a second SOT MTJ element by establishing a second current through a second write control transistor and a SHE layer of the second SOT MTJ element, the second current changing a magnetization state of the second SOT TJ element according to a complementary version of the data.
19. The method of claim 18, further comprising:
precharging the match indicator line to a predetermined voltage;
receiving supplemental search data presented on the first search line and the second search line; and
evaluating a data match state between the first SOT MTJ element and the second SOT MTJ element by enabling at least a first read transistor controlled by the first search line and coupled to a pinned layer of the first SOT MTJ and enabling a second read transistor controlled by the second search line and coupled to a pinned layer of the second SOT MTJ; and
a match result voltage representing the data match status is presented on the match indicator line.
20. The method of claim 19, wherein the match result voltage comprises the predetermined voltage when a tri-state match is determined between the search data and data stored in the first SOT MTJ and the second SOT MTJ, wherein the match result voltage comprises a voltage level lower than the predetermined voltage when a mismatch is determined between the search data and data stored in the first SOT MTJ and the second SOT MTJ.
Technical Field
Aspects of the present disclosure relate to the field of content addressable memories and artificial neural networks.
Background
Content Addressable Memories (CAMs) are data storage arrangements that allow fast searching in stored data using input values. Random Access Memory (RAM) uses an input memory address to retrieve data at a particular address. In contrast, a CAM accepts input data or input tags to determine whether the input data is held within the CAM, and if found, generates one or more storage addresses corresponding to matching input data within the CAM. When employed in an arrangement known as associative memory, one or more memory addresses determined by the CAM may then be input to random access memory to produce an output value based on those memory addresses. Another form of CAM is known as Ternary Content Addressable Memory (TCAM), which allows wildcard, "don't care," or undefined portions of input data. This TCAM arrangement may be useful when not all of the digits of the input data are known, and a list of addresses matching the input data pattern with wildcard values may be generated from the CAM. However, TCAM implementations require at least three states to be encoded for each bit, rather than two states for more traditional CAMs (known as binary CAMs).
Various TCAM implementations have been tried, but these implementations suffer from large power consumption, high semiconductor footprint, and limited speed. For example, Complementary Metal Oxide Semiconductor (CMOS) based CAMs can have large static power consumption and large area overhead due to the increased density of CMOS based CAMs. Another TCAM implementation employs Spin Transfer Torque (STT) Magnetic Random Access Memory (MRAM) cells. However, these arrangements have limited search speed, in part due to the low Tunnel Magnetoresistance (TMR) characteristics of STTMRAM configurations. Furthermore, the high write currents required for STT MRAM based TCAMs result in undesirable power consumption and larger feature sizes of the read/write support circuitry.
CAM and TCAM are commonly used in network routing devices. However, these memory structures may also be used in Artificial Neural Networks (ANNs). The ANN may be formed from individual artificial neurons simulated using software, integrated hardware, or other discrete components. Neuromorphic calculations may employ ANNs that focus on using electronic components such as analog/digital circuitry in an integrated system to mimic the human brain and attempt to better understand the neuro-biological architecture of the nervous system. Neuromorphic calculations emphasize how implementing a model of the nervous system to understand how the morphology of individual neurons, synapses, circuits, and architectures leads to ideal calculations.
Disclosure of Invention
Ternary Content Addressable Memory (TCAM) circuits are provided herein. In one exemplary embodiment, a TCAM circuit can include a first Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) element having a pinned layer coupled to a first read transistor controlled by a first search line and having a Spin Hall Effect (SHE) layer coupled in a first configuration across a supplemental write input. The TCAM circuit may include a second SOT MTJ element having a pinned layer coupled to a second read transistor controlled by a second search line and having a SHE layer coupled in a second configuration across a supplemental write input. The TCAM circuit may include: a bias transistor configured to provide a bias voltage to drain terminals of the first read transistor and the second read transistor; and a voltage keeper element coupling the drain terminal to the match indicator line.
Drawings
Many aspects of the disclosure can be better understood with reference to the following drawings. While several embodiments are described in connection with these drawings, the disclosure is not limited to the embodiments disclosed herein. On the contrary, the intent is to cover all alternatives, modifications and equivalents.
FIG. 1 illustrates a content addressable memory in an embodiment.
Fig. 2 shows an artificial neural network in an embodiment.
FIG. 3 illustrates a content addressable memory in an embodiment.
Fig. 4 shows an exemplary circuit with a spin orbit torque device in an embodiment.
FIG. 5 illustrates an exemplary content addressable memory having spin orbit torque devices in an embodiment.
FIG. 6 illustrates an exemplary operation of a content addressable memory having spin orbit torque devices in an embodiment.
FIG. 7 illustrates an exemplary operation of a content addressable memory having spin orbit torque devices in an embodiment.
FIG. 8 illustrates a computing system for hosting or controlling an artificial neural network or a content addressable memory having spin orbit torque devices, according to an embodiment.
FIG. 9 illustrates exemplary performance of a magnetic tunnel junction device.
Detailed Description
In the discussion herein, various enhancement circuits are presented. These enhancement circuits may include Content Addressable Memory (CAM) elements to further speed up operation and reduce power consumption of the neural network, among other applications. For example, the CAM structure may be employed in any content addressable memory and any content addressable memory application. A Ternary Content Addressable Memory (TCAM) structure is discussed herein that allows for the generation of matching results using wildcard, "don't care" or undefined portions of the input data. One such TCAM structure discussed herein includes two Spin Hall Effect (SHE) Magnetoresistive Random Access Memory (MRAM) cells that employ Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) elements.
CAM and TCAM structures typically employ non-volatile memory (NVM) elements to store data that can be searched using input data or input tags. Past attempts at CAM/TCAM structures include Complementary Metal Oxide Silicon (CMOS) structures for implementing Static Random Access Memory (SRAM) memory elements. While feasible for CAM/TCAM, CMOS based cells are large and consume more power than other cell types. Therefore, CMOS based cells are not desirable for use in neural network TCAM structures, such as those discussed below in fig. 2.
The CAM/TCAM structure may alternatively be formed using a Magnetic Tunnel Junction (MTJ) or various resistive memory devices, such as memristors. The MTJ element may be used to form a data storage element. The MTJ operates using a Tunnel Magnetoresistance (TMR), which is a magnetoresistive effect. MTJs typically consist of two layers of ferromagnetic material separated by a thin insulator layer through which electrons can quantum-mechanically tunnel from one ferromagnetic layer into the other. One ferromagnetic layer of the MTJ may be referred to as a pinned layer having a fixed magnetization state, while the other ferromagnetic layer of the MTJ includes a free layer that can change magnetization state. The intermediate layer, which comprises a thin insulator separating two ferromagnetic layers, may be formed of an oxide material or other suitable electrical insulator. Electrical terminals may be formed to interface the free and pinned layers of the MTJ with other components in the circuit.
The MTJ element may generally be placed in two different states, which may correspond to different logical values stored therein. These states depend on the magnetization state of the MTJ element, which corresponds to the magnetoresistive value currently assumed by the MTJ element. The variable magnetization state of the MTJ elements discussed herein can be varied between two states, a parallel state and an anti-parallel state. The parallel state occurs when the free and pinned layers of the MTJ element are in the same magnetization state. The antiparallel state occurs when the free layer and the pinned layer of the MTJ element are in different magnetization states. Logical values may be assigned to magnetization states, such as a logical "0" for an anti-parallel state and a logical "1" for a parallel state, among other configurations.
The MTJ types may include various configurations that may be employed in artificial neural network circuits and CAM/TCAM circuits. MTJ devices typically employ spin-polarized current to reversibly switch the magnetization state of a ferromagnetic layer, i.e., the magnetization state of the free layer described above. A perpendicular or parallel arrangement of MTJ elements may be employed, which refers to a type of magnetic anisotropy associated with a preferred alignment direction of magnetic moments within the MTJ elements relative to the corresponding semiconductor substrate surface. A first type of MTJ configuration includes a uniform perpendicular Spin Transfer Torque (STT) device, which typically includes a 2-terminal device formed of at least three stacked material layers. The trilayer includes a tunnel barrier layer disposed between the pinned layer and the free layer. The free layer and the pinned layer are coupled to two terminals of the STT MTJ.
STT MTJ based TCAM cells have been developed, which may consist of several control transistors and two STT MTJ elements. This structure has advantages over the CMOS configuration described above because little static power consumption occurs, including a more compact size, and has a reduced transistor count. Also, the configuration of the TCAM based on STT MTJ may use shared or separate Write Line (WL) and read/Search Line (SL). Separate write lines and search lines may have structural advantages over CMOS designs. However, in STT MTJ based circuits, the search speed (read speed) is limited, in part due to the low Tunnel Magnetoresistance (TMR) characteristics of STT MTJ elements and the higher relative write current inherent in STT configurations. Therefore, to increase the TCAM search speed, a larger size STT MTJ must be employed. Also, due to the circuit configuration, read disturb may be encountered.
Due to limitations and performance issues of CMOS and STT MTJ based TCAM designs, MTJ based enhancements are now being proposed. One such MTJ-based design discussed herein includes two Spin Hall Effect (SHE) Magnetoresistive Random Access Memory (MRAM) cells that employ heterogeneous in-plane Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) elements. This enhanced design may be used for any CAM or TCAM, which may or may not be employed in neural network applications. Accordingly, these enhanced cell circuit structures may be provided to a general content addressable memory structure.
The SOT MTJ device typically comprises a 3-terminal device. The SOT MTJ device may have additional metal bottom layer terminals and other differences compared to the two-terminal STT MTJ device. In these SOTMTJ configurations, separate "write" and "read" current paths are provided, which may allow for longer device lifetimes. In an SOT MTJ device, the write current is carried through a separate underlayer rather than through the tunnel barrier layer, as occurs in STT MTJ elements. The write current through the tunnel barrier layer in the STTMTJ element may cause more wear and damage to the tunnel barrier layer material than the SOT MTJ element. Furthermore, when separate write and read paths are employed, the read and write control elements (such as read or write control transistors) may have smaller relative dimensions compared to the STT MTJ structure. This is due in part to the greater Tunneling Magnetoresistance (TMR) of the SOT MTJ configuration compared to the STT MTJ structure. In particular, the SOT MTJ device may employ a higher TMR than the STT MTJ device, which may reduce the write and read energy required for the CAM/TCAM structure discussed herein. The reduced required energy corresponds to less read/write current required in the SOT MTJ configuration compared to other MRAM or CMOS structures, which may result in longer device durations.
The SOT MTJ may also be referred to herein as a Spin Hall Effect (SHE) MTJ, where the metal underlayer comprises Spin Hall Metal (SHM). An exemplary SHE MTJ structure is shown in fig. 4 below. Other methods may replace the SOT in-plane MTJ with an SOT perpendicular MTJ with an external electric field applied. However, this external field may reduce the thermal stability of adjacent circuits.
Before discussing enhanced SOT/SHM MTJ based CAM/TCAM structures, the relative performance between the various CAM/TCAM memory cell technologies will be briefly introduced. FIG. 9 shows a performance comparison between various techniques for implementing memory cell structures that may be employed in CAM/TCAM designs. FIG. 9 shows that read performance is increased by increasing thermal stability (Δ) in the SHE-MRAM architecture. As can be seen from graph 900-902 in FIG. 9, the SHE-MRAM structure exhibits a small write overhead even with a higher thermal stability (Δ) compared to the STT-MRAM structure. A higher delta also allows for a larger read current. Thus, by using the SHE-MRAM architecture, read latency can be reduced with minimal sacrifice of write latency.
In graph 900 of FIG. 9, a comparison of write delay behavior is shown between a STT-MRAM device and a SHE-MRAM as the percentage of thermal stability indicated along the horizontal axis increases. It can be seen that the SHE-MRAM device maintains a lower write delay over a greater range of thermal stability than the STT-MRAM device. In graph 901 of fig. 9, the read current for a SHE-MRAM device is shown with the range of thermal stability shown along the horizontal axis. In graph 902 of FIG. 9, the read delay of the SHE-MRAM device is shown with the range of thermal stability shown along the horizontal axis. The read current performance and read delay performance may be affected by the material selection, material purity, and material composition of the underlying layers of the SHE-MRAM structure.
Table 903 in FIG. 9 shows the performance of the SHE-MRAM employed in a level 2(L2) cache structure. In Table 903, a comparison is made between various types of memory cell structures, such as CMOS (SRAM), STT-MRAM, and SHE-MRAM structures. The SHE-MRAM device provides similar read delay, but lower leakage and denser area utilization than the SRAM device. The SHE-MRAM device performs better than STT-MRAM devices, reducing bit cell failure rates compared to STT-MRAM devices. However, STT-MRAM devices and SHE-MRAM devices have similar footprints. Furthermore, higher Tunnel Magnetoresistance (TMR) can be employed in SHE-MRAM devices to reduce the read energy required for the CAM/TCAM structures discussed herein.
Turning now to an enhanced architecture for implementing Content Addressable Memory (CAM) and Ternary Content Addressable Memory (TCAM) devices, fig. 1 is presented. As described above, a CAM is a memory that allows for the lookup of stored data via input search data instead of data addresses (as is done with most random access memory devices). The CAM compares the incoming search data with the stored data table and returns the address of the matching data. This address can then be used to retrieve the data itself from memory. Various types of CAMs may be formed, one exemplary type of which is referred to as ternary CAM (tcam). Conventional CAMs require binary formatting of the input search data. However, TCAMs allow the use of a third state (tri-state) of "don't care" or wildcard for undefined portions of the input search data, so that there is no need to present an exact input search data instance to the TCAM to generate a resulting address. Fig. 1 shows an
In fig. 1,
Also shown in fig. 1 is a
One exemplary application of the CAM/TCAM structure is an Artificial Neural Network (ANN). Conventional processing equipment as well as specialized circuitry may be used to form the artificial neural network. Exemplary processing devices include a Central Processing Unit (CPU) or Graphics Processing Unit (GPU) that is suitable for machine learning applications, such as image recognition, speech recognition, handwriting recognition, and other applications.
To perform machine learning operations, CPUs may be a limited choice due to the architectural design of most CPUs. For example, CPUs are good at processing a very complex instruction set very efficiently but lack parallelism. In machine learning calculations, especially training operations, the basic operation is vector matrix multiplication. GPUs, which have begun to gain favor over CPUs, use a parallel architecture and are adept at handling many parallel very simple instruction sets. Another emerging option is ASICs, such as Tensor Processing Units (TPU), which are adept at performing a particular task. As machine learning is increasingly integrated into everyday applications, there is an increasing interest in making these specialized chips for machine learning tasks and making existing processor-based designs more efficient.
In addition to data processing speed, another problem in machine learning and neural networks is power consumption. The machine learning task may cost the GPU or TPU up to several hundred watts to perform, in contrast to the human brain, which performs similar cognitive tasks by using only about 20 watts. This high power consumption disadvantage has motivated research into methods of biological or brain elicitation (such as neuromorphic calculations) to address machine learning power consumption issues. As will be discussed below, another approach is taken to reduce power consumption when using a GPU or other processing device. This enhanced approach uses a new structure of Content Addressable Memory (CAM) devices to supplement the operation of GPUs in neural networks.
As described above, an ANN such as a Convolutional Neural Network (CNN) may be implemented using a multi-core processor formed by GPUs. In GPU architectures, a Floating Point Unit (FPU), which includes an Adder (ADD) element, a Multiplier (MUL) element, and a multiply accumulator element in a streaming core that processes data, consumes a large amount of energy. The enhanced CAM/TCAM example herein may be used in an ANN using a processor (such as a GPU). In particular, a Content Addressable Memory (CAM) may be coupled to a processing pipeline in a GPU or other processor. The CAM may be used to store high frequency patterns encountered by the CNN. These high frequency patterns can be efficiently searched via search terms or search data, which can provide significant computational reduction and power savings in CNNs.
Fig. 2 illustrates an
In FIG. 2, input 201 is presented to a pipeline corresponding to FPU stages 211 and 215 of FPU 210 within a GPGPU. Floating point result (Q)FPU) From processing input through the FPU pipeline. Input 201 is also simultaneously presented to TCAM221 as input search data. When the search is successful in TCAM221 (e.g., a search hit), then hit indicator signal 223 is presented to pipeline control circuit 216. Pipeline control circuit 216 may include a clock circuit that provides a clock signal to FPU stage 211 and 215. The hit indicator signal 223 indicates to the clock circuit of the pipeline control circuit 216 to gate or otherwise disable the clock signal to the FPU stage 211 and 215. Once a hit results in a data output from the memory associated with TCAM221, a corresponding result (Q) is providedAM) As output of pipeline instead of QFPUAnd (6) obtaining the result.
As can be seen in FIG. 2, when the input is contained in associative memory 220, then the input need not be pipelined by the FPU. Significant power savings may be achieved by disabling the FPU pipeline in these cases using the hit indicator signal 223 and clock circuit control. In addition to clock control circuitry, other ways of controlling the FPU pipeline may be implemented, such as power control gating of FPU pipeline circuit elements, logic disabling of FPU pipeline circuits, or other techniques. However, when the input operands are not contained in associative memory 220, then the FPU pipeline may be enabled to process the input to produce a result. The selection of either the TCAM based result or the FPU based result is based on whether a match in the TCAM is indicated.
Associative memory 220 is updated to hold frequent results from processing input operand data through the FPU pipeline. The updating may be performed based on various criteria, such as when the results are similar to previous results, or using each result during an initialization period until the associative memory is filled to capacity. Subsequent hits may indicate that the resulting data is allowed to remain in the associative memory, and data having several hits associated therewith may be replaced with new results provided by the FPU pipeline. Since TCAM221 may indicate a successful hit based on the tri-state formatted data, there is no need to present an exact match as an input operand to produce a hit. In machine learning applications and other neural network applications, this match/hit may be sufficient to produce a result and eliminate power consumption of the FPU pipeline stage.
Fig. 3 presents a schematic diagram 300 of a content
Control circuitry 360 is configured to write data into
To implement a TCAM, various methods can be employed. Referring to fig. 3, each "cell" (C) component in the content
In fig. 4, a control circuit, not shown, controls the direction of charge movement through the
Also shown in fig. 4 are the peripheral structures used to form the MRAM configuration that allows some data to be written and read through the change in the magnetization state of the MTJ element 420. These structures include a read control transistor (440) and a write control transistor (441), as well as various control lines. In addition, the
The operation of the SHE-MRAM architecture shown in fig. 4 may be performed in accordance with the voltages presented in table 401. In operation, the first control transistor 440 or read switching element controls the read path and is coupled to the
Advantageously, the SHE-MRAM architecture in FIG. 4 provides a low current per unit of thermal stability (Δ) (i.e., I through underlying material 431)CHANNEL) I.e. low ICHANNELA/delta, with efficient spin generation per unit charge current (i.e. I)SPIN/ICHARGE>100%). exemplary materials for
We turn now to an exemplary implementation of a TCAM cell based on a SHE-MRAM device. FIG. 5 shows an enhanced CAM/TCAM cell structure with SHE (SOT) -MRAM elements. In particular, fig. 5 shows a
As described above,
It should be noted that the "B" symbol used on the control lines in fig. 5 indicates a logic complement or logic complement version of the companion signal. For example, row SLB (517) is a logically complementary version of row SL (516). A complementary version refers to a logical negation or logical inversion of a particular signal or logical value. For example, when line SL is at a particular voltage level (such as V)DDOr a logic "1"), then line SLB is at the supplemental voltage level (i.e., 0V or a logic "0").
Turning now to the structure of
Additional control elements are included in the TCAM cell structure of fig. 5. First, precharge transistor 530(M0) includes a transistor configured to pull ML511 to a predetermined voltage level (such as V) prior to a read/search operationDD) The precharge component of (1). The
Another element 536(M6) is included as a voltage keeper for
Six transistors are employed in the exemplary SHE-MRAM device of fig. 5 to implement separate read and write paths. Thus, for a comparable four transistor STT-MRAM device, a larger structure footprint may be assumed. However, the
As described above, the
The
The SHM545 is coupled to the supplemental write inputs 512,513 (BL, BLB) in a second configuration, also referred to as a second configuration. As used herein, the second configuration refers to a particular arrangement and set of connections between the supplemental write inputs 512,513 (BL, BLB) and the
The operation of the SHE-MRAM device based
The data value may be written into the TCAM prior to a search/read operation of the TCAM. Write
Wl1 is configured to control write control transistor 533(M3) and write data into the left MTJ structure, where the data is based on the current between the BL/BLB lines. WL2 is configured to control write control transistor 534(M4) and write data into the right MTJ structure based on the current between the BL/BLB lines. WL1 and WL2 are enabled simultaneously or in a sequential manner depending on the data to be written, in part because the BL/BLB lines are shared. FIG. 5 above shows exemplary write values corresponding to the BL/BLB line in the truth table.
The write process for each of the SHE-MRAM structures 540-541 may write a logical "1" or "0" to the corresponding
In fig. 6, the SHE-MRAM based TCAM cell provides two steps of search/read operations, namely a precharge phase and an evaluation phase. In the precharge phase, the read control transistors (M1, M2) are turned off via the SL/SLB lines, and the Match Line (ML)511 is turned off by using prechargeAn electrical control signal (PC) to enable
Table 600 shown in fig. 6 further shows the correspondence among stored data, search lines, and match lines in the
Control circuitry (such as discrete control logic, integrated control logic, processing devices, firmware, software, or other control elements) may be used to control the operation of the circuitry presented in fig. 5. The control circuit may be integrated with one or more instances of the circuit in fig. 5. For example, when the circuit in fig. 5 is used to form a TCAM cell array, then the control circuit may be coupled to shared control lines for TCAM cells arranged in rows and columns. Specifically, the SL/SLB, WL1/WL2, and BL/BLB lines may be coupled to an input driver circuit as shown in FIG. 3. The Match Line (ML) may be coupled to a sense amplifier and further output circuitry for presenting a data match status to further circuitry or processing equipment. When used in a neural network circuit, the TCAM array may be used to speed up operation of the neural network and reduce power consumption of the neural network when a match of input data is found in the TCAM array, as shown in fig. 2.
In another exemplary circuit that may be used as a TCAM cell, a first Magnetoresistive Random Access Memory (MRAM) structure includes a first Magnetic Tunnel Junction (MTJ) element coupled to a first Spin Hall Metal (SHM) layer at a corresponding free layer. The first MTJ element is coupled at a corresponding pinned layer to a first read control transistor (M1) controlled by a first Search Line (SLB). The first SHM layer includes a first terminal coupled to a first write control transistor (M3) controlled by a first write line (WL1), and a second terminal coupled to a Bit Line (BL). The second MRAM structure includes a second MTJ element coupled to the second SHM layer at a corresponding free layer. The second MTJ element is coupled at a corresponding pinned layer to a second read control transistor (M2) controlled by a second Search Line (SL). The second SHM layer includes a first terminal coupled to a second write control transistor (M4) controlled by a second write line (WL2), and a second terminal coupled to a Bit Line (BL). The circuit may include a bias transistor (M5), the bias transistor (M5) configured to provide a bias voltage to the first read control transistor (M1) and the second read control transistor (M2). When used in an array having more than one TCAM cell sharing ML, the circuit may include a voltage keeper element (M6) coupled between the match line and the bias voltage, and a precharge element (M0) coupled to the Match Line (ML).
Fig. 7 presents additional exemplary operations of the circuits discussed above (such as the circuit of fig. 5). In fig. 7, the write process 700 is discussed in operation 701-702, and the search or read
In the write process 700, the control circuit writes data into a first Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) element of a TCAM cell. With respect to the elements of fig. 5, the control circuitry may write (701) data into the first MRAM architecture (540) by: at least the first write control transistor 533(M3) is enabled and a write voltage level is presented between a first terminal (520) of the
In a
After precharging the ML511, a search or read process can evaluate (704) the data match state within the TCAM cell by evaluating the input data against data previously stored in the first MRAM architecture 540 and the
As discussed herein, a TCAM based on SHE-MRAM devices has several advantages over STT-MRAM TCAMs. For example, TCAMs based on SHE-MRAM devices have faster searching by employing larger read currents than TCAMs based on STT-MRAM devices. The TCAM based SHE-MRAM devices are more robust to data retention failures, in part because of the separate read/write paths enabled by the SOT-MRAM elements. The SOT-MRAM elements also enable lower write currents compared to CMOS based TCAMs, while still maintaining near zero static power consumption and compact bit cell size. The manufacturing area overhead of the elements of the SHE-MRAM device based TCAM circuit detailed in FIG. 5 is also advantageous. For example, the M1, M2 transistors may be smaller than transistors in STT or CMOS based TCAMs, in part because M1, M2 are only used for read operations. Even if an additional two write control transistors (M3, M4) are employed in the SHE-MRAM device based TCAM, these transistors are relatively small due to the relatively low SOT-MRAM write current.
Thus, the SHE-MRAM configuration proposed herein can improve the read speed of TCAM cells with a larger read current by increasing thermal stability, since the corresponding write overhead is insignificant due to the high spin polarization efficiency. The TCAM based on SHE-MRAM devices can be successfully used in SOT-MRAM based artificial neural network engines, such as those discussed above in FIG. 2.
FIG. 8 illustrates a computing system 801 that represents any system or collection of systems in which the various operating architectures, scenarios, and processes disclosed herein can be implemented. For example, the computing system 801 may be used to implement the control circuitry of the elements of fig. 1, the control portions of the elements of fig. 2, and the FPU stage of fig. 2, the control circuitry 360 of fig. 3, the control circuitry of fig. 5, and other circuitry discussed herein. Further, computing system 801 may be used to store write data prior to storage to a TCAM unit and to store search results after the search process is completed. In another example, computing system 801 may configure interconnect circuitry to establish one or more arrays of TCAM cells or to connect TCAM cells into an artificial neural network circuit. In still other examples, computing system 801 may fully implement an artificial neural network, such as the artificial neural network shown in fig. 2, to create an at least partially software-implemented artificial neural network through an externally-implemented enhanced TCAM cell structure. Computing system 801 may implement control of any of the TCAM unit operations discussed herein, whether implemented using hardware or software components, or any combination thereof.
Examples of computing system 801 include, but are not limited to, computers, smart phones, tablet computing devices, notebook computers, desktop computers, hybrid computers, rack-mounted servers, web servers, cloud computing platforms, cloud computing systems, distributed computing systems, software-defined networking systems and data center devices, and any other type of physical or virtual machine, and other computing systems and devices, and any variations or combinations thereof.
Computing system 801 may be implemented as a single apparatus, system, or device, or may be implemented in a distributed fashion as multiple apparatuses, systems, or devices. Computing system 801 includes, but is not limited to,
Still referring to fig. 8, the
Storage system 803 may include any computer-readable storage medium readable by
In addition to computer-readable storage media, in some embodiments, storage system 803 may also include computer-readable communication media through which at least some software 805 may be communicated, internally or externally. Storage system 803 may be implemented as a single storage device, but may also be implemented across multiple storage devices or subsystems that are co-located or distributed with respect to each other. The storage system 803 may include additional elements, such as a controller, capable of communicating with the
The software 805 may be implemented with program instructions and, among other functions, when executed by the
In particular, the program instructions may include various components or modules that cooperate or otherwise interact to perform the various processes and operational scenarios described herein. The various components or modules may be embodied in compiled or interpreted instructions, or in some other variation or combination of instructions. The various components or modules may execute in a synchronous or asynchronous manner, in serial or parallel, in a single threaded environment or in multiple threads, or according to any other suitable execution paradigm, variant, or combination thereof. In addition to or including the
In general, the software 805, when loaded into the
For example, if the computer-readable storage medium is implemented as a semiconductor-based memory, the software 805 may transform the physical state of the semiconductor memory when the program instructions are encoded therein, such as by transforming the state of transistors, capacitors, or other discrete circuit elements that make up the semiconductor memory. Similar transformations may occur with respect to magnetic media or optical media. Other transformations of physical media are possible without departing from the scope of the present description, with the foregoing examples provided only to facilitate this discussion.
The
In one example, the
WL1/WL2 and BL/BLB in fig. 5) writes data into a TCAM cell or array. The TCAMR/W service 824 can control the enabling/disabling of the write control transistors in the proper ordering to properly write data into the TCAM cells and TCAM array. The TCAM R/W service 824 may also perform search or read operations. The TCAM R/W service 824 can control the search for matches to data previously written into a TCAM cell or TCAM array, such as the PC, BIAS, and SL/SLB lines in FIG. 5. The TCAMR/W service 824 can read results presented on Match Lines (ML). In some examples, the TCAM R/W service 824 may implement output encoder/decoder or multiplexer logic to assemble discrete search result values into a bit vector or process multiple search match outputs generated by the TCAM. TCAM R/W service 824 may transmit the resulting search match/no match indication to one or more other systems via communication interface 807 or present the resulting search match/no match indication to one or more users via
When a TCAM structure is employed in a GPU-implemented ANN or CNN,
The communication interface system 807 may include communication connections and devices that allow communication with other computing systems (not shown) over a communication network (not shown). The communication interface system 807 may also communicate with portions of the hardware implemented ANN, such as with layers of the ANN, or TCAM structures and circuits. Examples of connections and devices that together allow for inter-system communication may include NVM memory interfaces, network interface cards, antennas, power amplifiers, RF circuitry, transceivers, and other communication circuitry. The connections and devices may communicate over a communication medium, such as metal, glass, air, or any other suitable communication medium, to exchange communications or data with other computing systems or system networks.
The
Communication between computing system 801 and other computing systems (not shown) may occur over one or more communication networks and according to various communication protocols, combinations of protocols, or variations thereof. Examples include an intranet, the internet, a local area network, a wide area network, a wireless network, a wired network, a virtual network, a software defined network, a data center bus, a computing backplane, or any other type of network, combination of networks, or variations thereof. The above communication networks and protocols are well known and need not be discussed in detail here. However, some communication protocols that may be used include, but are not limited to, internet protocol (IP, IPv4, IPv6, etc.), Transmission Control Protocol (TCP), and User Datagram Protocol (UDP), as well as any other suitable communication protocol, variation, or combination thereof.
The description and drawings are included to depict specific embodiments to teach those skilled in the art how to make and use the best mode. For the purpose of teaching inventive principles, some conventional aspects have been simplified or omitted. Those skilled in the art will appreciate variations from these embodiments that fall within the scope of the disclosure. Those skilled in the art will also appreciate that the above-described features may be combined in various ways to form multiple embodiments. Accordingly, the present invention is not limited to the specific embodiments described above, but only by the claims and their equivalents.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:存储器控制器及其操作方法