Content addressable memory with spin orbit torque device

文档序号：1414856 发布日期：2020-03-10 浏览：39次中文

阅读说明：本技术 具有自旋轨道转矩设备的内容可寻址存储器 (Content addressable memory with spin orbit torque device ) 是由 W·H·崔 J·金于 2019-06-03 设计创作，主要内容包括：本发明题为“具有自旋轨道转矩设备的内容可寻址存储器”。本文提供了三态内容可寻址存储器(TCAM)电路。在一个示例性实施方式中,TCAM电路可以包括第一自旋轨道转矩(SOT)磁隧道结(MTJ)元件,该第一自旋轨道转矩(SOT)磁隧道结(MTJ)元件具有耦接到由第一搜索线控制的第一读取晶体管的钉扎层,并且具有跨补充写入输入以第一配置耦接的自旋霍尔效应(SHE)层。TCAM电路可以包括第二SOT MTJ元件,该第二SOT MTJ元件具有耦接到由第二搜索线控制的第二读取晶体管的钉扎层,并且具有跨补充写入输入以第二配置耦接的SHE层。TCAM电路可以包括：偏置晶体管,该偏置晶体管被配置成向第一读取晶体管和第二读取晶体管的漏极端子提供偏置电压；以及电压保持器元件,该电压保持器元件将漏极端子耦接到匹配指示符线。(The invention provides a content addressable memory with spin orbit torque device. Ternary Content Addressable Memory (TCAM) circuits are provided herein. In one exemplary embodiment, a TCAM circuit may include a first Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) element having a pinned layer coupled to a first read transistor controlled by a first search line and having a Spin Hall Effect (SHE) layer coupled in a first configuration across a supplemental write input. The TCAM circuit may include a second SOT MTJ element having a pinned layer coupled to a second read transistor controlled by a second search line and having a SHE layer coupled in a second configuration across a supplemental write input. The TCAM circuit may include: a bias transistor configured to provide a bias voltage to drain terminals of the first read transistor and the second read transistor; and a voltage keeper element coupling the drain terminal to the match indicator line.)

1. A Content Addressable Memory (CAM) circuit, comprising:

a first Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) element having a pinned layer coupled to a first read transistor controlled by a first search line and having a Spin Hall Effect (SHE) layer coupled in a first configuration across a supplemental write input;

a second SOT MTJ element having a pinned layer coupled to a second read transistor controlled by a second search line and having a SHE layer coupled in a second configuration across the supplemental write input;

a bias transistor configured to connect a bias voltage to drain terminals of the first read transistor and the second read transistor; and

a voltage keeper element coupling the drain terminal to a match indicator line.

2. The CAM circuit of claim 1, comprising:

the SHE layer of the first SOT MTJ element, the SHE layer coupled in the first configuration across the supplemental write input by a first write control transistor controlled by a first write control line; and

a SHE layer of the second SOT MTJ element, the SHE layer coupled in the second configuration across the supplemental write input by a second write control transistor controlled by a second write control line.

3. The CAM circuit of claim 2, comprising:

a control circuit configured to write data into the first SOT MTJ element by at least enabling the first write control transistor to establish a first current through the SHE layer of the first SOT MTJ element, the first current changing a magnetization state of the first SOT MTJ element according to the data; and

the control circuit is configured to write data into the second SOT MTJ element by at least enabling the second write control transistor to establish a second current through the SHE layer of the second SOT MTJ element that changes a magnetization state of the second SOT MTJ element according to a complementary version of the data.

4. The CAM circuit of claim 1, comprising:

a precharge element coupling the match indicator line to a predetermined voltage according to a precharge control signal.

5. The CAM circuit of claim 1, comprising:

a control circuit configured to precharge the match indicator line to a predetermined voltage and disable the first read transistor and the second read transistor; and

after precharging the match indicator line, the control circuit is configured to evaluate a data match state between the first SOT MTJ element and the second SOT MTJ element by enabling at least the first read transistor and the second read transistor when supplemental search data is present on the first search line and the second search line to responsively output a match result voltage on the match indicator line representative of the data match state.

6. The CAM circuit of claim 5, wherein the match result voltage comprises the predetermined voltage when a match is determined between the search data and data stored in the first SOT MTJ and the second SOT MTJ, wherein the match result voltage comprises a voltage level lower than the predetermined voltage when a mismatch is determined between the search data and data stored in the first SOT MTJ and the second SOT MTJ.

7. The CAM circuit of claim 1, comprising:

a separate write path and read path arrangement, wherein the write path includes the supplemental write input coupled to the SHE layer of the first SOT MTJ and the SHE layer of the second SOT MTJ, and wherein the read path includes the first read transistor controlled by the first search line and the second read transistor controlled by the second search line.

8. The CAM circuit of claim 1, wherein the SHE layer of the first SOT MTJ element and the SHE layer of the second SOT MTJ element each comprise a Spin Hall Metal (SHM) material comprising one of beta (β) -tungsten and β -tantalum.

9. The CAM circuit of claim 1, wherein the first search line and the second search line accept tri-state inputs.

10. A Ternary Content Addressable Memory (TCAM) array, comprising:

a plurality of TCAM cells arranged in columns and rows, wherein the TCAM cells of each column are coupled via an associated search line and the TCAM cells of each row are coupled via an associated match line;

each of the plurality of TCAM units includes:

a first Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) element having a pinned layer coupled to a first search control transistor controlled by a first search line and having a Spin Hall Effect (SHE) layer coupled in a first configuration across a supplemental write input;

a second SOT MTJ element having a pinned layer coupled to a second search control transistor controlled by a second search line and having a SHE layer coupled in a second configuration across the supplemental write input;

a bias transistor configured to provide a bias voltage to drain terminals of the first search control transistor and the second search control transistor; and

a voltage keeper element coupling the drain terminal to a corresponding match line.

11. The TCAM array of claim 10, each of the plurality of TCAM cells further comprising:

the SHE layer of the first SOT MTJ element, the SHE layer coupled to the supplemental write input in the first configuration through a first write control transistor controlled by a first write control line; and

the SHE layer of the second SOT MTJ element coupled to the supplemental write input in the second configuration through a second write control transistor controlled by a second write control line.

12. The TCAM array of claim 11, comprising:

a control circuit configured to write data to the TCAM cell by at least:

enabling a first write control transistor of the TCAM cell to establish a first current through the SHE layer of the first SOT MTJ element that changes a magnetization state of the TCAM cell according to the data; and

enabling a second write control transistor of the TCAM cell to establish a second current through the SHE layer of the second SOT MTJ element that changes a magnetization state of the TCAM cell according to a complementary version of the data.

13. The TCAM array of claim 10, each of the plurality of TCAM cells further comprising:

a precharge element to couple the corresponding match line to a predetermined voltage in response to a precharge control signal.

14. The TCAM array of claim 10, comprising:

a control circuit configured to search for data in the TCAM cell by at least:

after precharging the associated match line to a predetermined voltage, when search data is present on the associated search line, evaluating a data match status between the TCAM cells by enabling at least the first read transistor and the second read transistor of the TCAM cells to responsively output a match result voltage on the associated match indicator line representative of the data match status of the TCAM cells.

15. The TCAM array of claim 14, wherein the match result voltage for each row includes the predetermined voltage when a match is determined between the search data and data stored in TCAM cells searched in each row, and wherein the match result voltage for each row includes a voltage level below the predetermined voltage when a mismatch is determined between the search data and data stored in TCAM cells searched in each row.

16. The TCAM array of claim 10, each of the plurality of TCAM cells further comprising:

17. The TCAM array of claim 10, wherein the SHE layer of the first SOT MTJ element and the SHE layer of the second SOT MTJ element each comprise a Spin Hall Metal (SHM) material comprising one of beta (β) -tungsten and β -tantalum.

18. A method of operating a Ternary Content Addressable Memory (TCAM) cell, comprising:

writing data in a first spin-orbit-torque (SOT) Magnetic Tunnel Junction (MTJ) element by establishing a first current through a first write control transistor and a SHE layer of the first SOT MTJ element, the first current changing a magnetization state of the first SOT MTJ element in accordance with the data; and

writing data in a second SOT MTJ element by establishing a second current through a second write control transistor and a SHE layer of the second SOT MTJ element, the second current changing a magnetization state of the second SOT TJ element according to a complementary version of the data.

19. The method of claim 18, further comprising:

precharging the match indicator line to a predetermined voltage;

receiving supplemental search data presented on the first search line and the second search line; and

evaluating a data match state between the first SOT MTJ element and the second SOT MTJ element by enabling at least a first read transistor controlled by the first search line and coupled to a pinned layer of the first SOT MTJ and enabling a second read transistor controlled by the second search line and coupled to a pinned layer of the second SOT MTJ; and

a match result voltage representing the data match status is presented on the match indicator line.

20. The method of claim 19, wherein the match result voltage comprises the predetermined voltage when a tri-state match is determined between the search data and data stored in the first SOT MTJ and the second SOT MTJ, wherein the match result voltage comprises a voltage level lower than the predetermined voltage when a mismatch is determined between the search data and data stored in the first SOT MTJ and the second SOT MTJ.

Technical Field

Aspects of the present disclosure relate to the field of content addressable memories and artificial neural networks.

Background

Content Addressable Memories (CAMs) are data storage arrangements that allow fast searching in stored data using input values. Random Access Memory (RAM) uses an input memory address to retrieve data at a particular address. In contrast, a CAM accepts input data or input tags to determine whether the input data is held within the CAM, and if found, generates one or more storage addresses corresponding to matching input data within the CAM. When employed in an arrangement known as associative memory, one or more memory addresses determined by the CAM may then be input to random access memory to produce an output value based on those memory addresses. Another form of CAM is known as Ternary Content Addressable Memory (TCAM), which allows wildcard, "don't care," or undefined portions of input data. This TCAM arrangement may be useful when not all of the digits of the input data are known, and a list of addresses matching the input data pattern with wildcard values may be generated from the CAM. However, TCAM implementations require at least three states to be encoded for each bit, rather than two states for more traditional CAMs (known as binary CAMs).

Various TCAM implementations have been tried, but these implementations suffer from large power consumption, high semiconductor footprint, and limited speed. For example, Complementary Metal Oxide Semiconductor (CMOS) based CAMs can have large static power consumption and large area overhead due to the increased density of CMOS based CAMs. Another TCAM implementation employs Spin Transfer Torque (STT) Magnetic Random Access Memory (MRAM) cells. However, these arrangements have limited search speed, in part due to the low Tunnel Magnetoresistance (TMR) characteristics of STTMRAM configurations. Furthermore, the high write currents required for STT MRAM based TCAMs result in undesirable power consumption and larger feature sizes of the read/write support circuitry.

CAM and TCAM are commonly used in network routing devices. However, these memory structures may also be used in Artificial Neural Networks (ANNs). The ANN may be formed from individual artificial neurons simulated using software, integrated hardware, or other discrete components. Neuromorphic calculations may employ ANNs that focus on using electronic components such as analog/digital circuitry in an integrated system to mimic the human brain and attempt to better understand the neuro-biological architecture of the nervous system. Neuromorphic calculations emphasize how implementing a model of the nervous system to understand how the morphology of individual neurons, synapses, circuits, and architectures leads to ideal calculations.

Disclosure of Invention

Ternary Content Addressable Memory (TCAM) circuits are provided herein. In one exemplary embodiment, a TCAM circuit can include a first Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) element having a pinned layer coupled to a first read transistor controlled by a first search line and having a Spin Hall Effect (SHE) layer coupled in a first configuration across a supplemental write input. The TCAM circuit may include a second SOT MTJ element having a pinned layer coupled to a second read transistor controlled by a second search line and having a SHE layer coupled in a second configuration across a supplemental write input. The TCAM circuit may include: a bias transistor configured to provide a bias voltage to drain terminals of the first read transistor and the second read transistor; and a voltage keeper element coupling the drain terminal to the match indicator line.

Drawings

Many aspects of the disclosure can be better understood with reference to the following drawings. While several embodiments are described in connection with these drawings, the disclosure is not limited to the embodiments disclosed herein. On the contrary, the intent is to cover all alternatives, modifications and equivalents.

FIG. 1 illustrates a content addressable memory in an embodiment.

Fig. 2 shows an artificial neural network in an embodiment.

FIG. 3 illustrates a content addressable memory in an embodiment.

Fig. 4 shows an exemplary circuit with a spin orbit torque device in an embodiment.

FIG. 5 illustrates an exemplary content addressable memory having spin orbit torque devices in an embodiment.

FIG. 6 illustrates an exemplary operation of a content addressable memory having spin orbit torque devices in an embodiment.

FIG. 7 illustrates an exemplary operation of a content addressable memory having spin orbit torque devices in an embodiment.

FIG. 8 illustrates a computing system for hosting or controlling an artificial neural network or a content addressable memory having spin orbit torque devices, according to an embodiment.

FIG. 9 illustrates exemplary performance of a magnetic tunnel junction device.

Detailed Description

In the discussion herein, various enhancement circuits are presented. These enhancement circuits may include Content Addressable Memory (CAM) elements to further speed up operation and reduce power consumption of the neural network, among other applications. For example, the CAM structure may be employed in any content addressable memory and any content addressable memory application. A Ternary Content Addressable Memory (TCAM) structure is discussed herein that allows for the generation of matching results using wildcard, "don't care" or undefined portions of the input data. One such TCAM structure discussed herein includes two Spin Hall Effect (SHE) Magnetoresistive Random Access Memory (MRAM) cells that employ Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) elements.

CAM and TCAM structures typically employ non-volatile memory (NVM) elements to store data that can be searched using input data or input tags. Past attempts at CAM/TCAM structures include Complementary Metal Oxide Silicon (CMOS) structures for implementing Static Random Access Memory (SRAM) memory elements. While feasible for CAM/TCAM, CMOS based cells are large and consume more power than other cell types. Therefore, CMOS based cells are not desirable for use in neural network TCAM structures, such as those discussed below in fig. 2.

The CAM/TCAM structure may alternatively be formed using a Magnetic Tunnel Junction (MTJ) or various resistive memory devices, such as memristors. The MTJ element may be used to form a data storage element. The MTJ operates using a Tunnel Magnetoresistance (TMR), which is a magnetoresistive effect. MTJs typically consist of two layers of ferromagnetic material separated by a thin insulator layer through which electrons can quantum-mechanically tunnel from one ferromagnetic layer into the other. One ferromagnetic layer of the MTJ may be referred to as a pinned layer having a fixed magnetization state, while the other ferromagnetic layer of the MTJ includes a free layer that can change magnetization state. The intermediate layer, which comprises a thin insulator separating two ferromagnetic layers, may be formed of an oxide material or other suitable electrical insulator. Electrical terminals may be formed to interface the free and pinned layers of the MTJ with other components in the circuit.

The MTJ element may generally be placed in two different states, which may correspond to different logical values stored therein. These states depend on the magnetization state of the MTJ element, which corresponds to the magnetoresistive value currently assumed by the MTJ element. The variable magnetization state of the MTJ elements discussed herein can be varied between two states, a parallel state and an anti-parallel state. The parallel state occurs when the free and pinned layers of the MTJ element are in the same magnetization state. The antiparallel state occurs when the free layer and the pinned layer of the MTJ element are in different magnetization states. Logical values may be assigned to magnetization states, such as a logical "0" for an anti-parallel state and a logical "1" for a parallel state, among other configurations.

The MTJ types may include various configurations that may be employed in artificial neural network circuits and CAM/TCAM circuits. MTJ devices typically employ spin-polarized current to reversibly switch the magnetization state of a ferromagnetic layer, i.e., the magnetization state of the free layer described above. A perpendicular or parallel arrangement of MTJ elements may be employed, which refers to a type of magnetic anisotropy associated with a preferred alignment direction of magnetic moments within the MTJ elements relative to the corresponding semiconductor substrate surface. A first type of MTJ configuration includes a uniform perpendicular Spin Transfer Torque (STT) device, which typically includes a 2-terminal device formed of at least three stacked material layers. The trilayer includes a tunnel barrier layer disposed between the pinned layer and the free layer. The free layer and the pinned layer are coupled to two terminals of the STT MTJ.

STT MTJ based TCAM cells have been developed, which may consist of several control transistors and two STT MTJ elements. This structure has advantages over the CMOS configuration described above because little static power consumption occurs, including a more compact size, and has a reduced transistor count. Also, the configuration of the TCAM based on STT MTJ may use shared or separate Write Line (WL) and read/Search Line (SL). Separate write lines and search lines may have structural advantages over CMOS designs. However, in STT MTJ based circuits, the search speed (read speed) is limited, in part due to the low Tunnel Magnetoresistance (TMR) characteristics of STT MTJ elements and the higher relative write current inherent in STT configurations. Therefore, to increase the TCAM search speed, a larger size STT MTJ must be employed. Also, due to the circuit configuration, read disturb may be encountered.

Due to limitations and performance issues of CMOS and STT MTJ based TCAM designs, MTJ based enhancements are now being proposed. One such MTJ-based design discussed herein includes two Spin Hall Effect (SHE) Magnetoresistive Random Access Memory (MRAM) cells that employ heterogeneous in-plane Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) elements. This enhanced design may be used for any CAM or TCAM, which may or may not be employed in neural network applications. Accordingly, these enhanced cell circuit structures may be provided to a general content addressable memory structure.

The SOT MTJ device typically comprises a 3-terminal device. The SOT MTJ device may have additional metal bottom layer terminals and other differences compared to the two-terminal STT MTJ device. In these SOTMTJ configurations, separate "write" and "read" current paths are provided, which may allow for longer device lifetimes. In an SOT MTJ device, the write current is carried through a separate underlayer rather than through the tunnel barrier layer, as occurs in STT MTJ elements. The write current through the tunnel barrier layer in the STTMTJ element may cause more wear and damage to the tunnel barrier layer material than the SOT MTJ element. Furthermore, when separate write and read paths are employed, the read and write control elements (such as read or write control transistors) may have smaller relative dimensions compared to the STT MTJ structure. This is due in part to the greater Tunneling Magnetoresistance (TMR) of the SOT MTJ configuration compared to the STT MTJ structure. In particular, the SOT MTJ device may employ a higher TMR than the STT MTJ device, which may reduce the write and read energy required for the CAM/TCAM structure discussed herein. The reduced required energy corresponds to less read/write current required in the SOT MTJ configuration compared to other MRAM or CMOS structures, which may result in longer device durations.

The SOT MTJ may also be referred to herein as a Spin Hall Effect (SHE) MTJ, where the metal underlayer comprises Spin Hall Metal (SHM). An exemplary SHE MTJ structure is shown in fig. 4 below. Other methods may replace the SOT in-plane MTJ with an SOT perpendicular MTJ with an external electric field applied. However, this external field may reduce the thermal stability of adjacent circuits.

Before discussing enhanced SOT/SHM MTJ based CAM/TCAM structures, the relative performance between the various CAM/TCAM memory cell technologies will be briefly introduced. FIG. 9 shows a performance comparison between various techniques for implementing memory cell structures that may be employed in CAM/TCAM designs. FIG. 9 shows that read performance is increased by increasing thermal stability (Δ) in the SHE-MRAM architecture. As can be seen from graph 900-902 in FIG. 9, the SHE-MRAM structure exhibits a small write overhead even with a higher thermal stability (Δ) compared to the STT-MRAM structure. A higher delta also allows for a larger read current. Thus, by using the SHE-MRAM architecture, read latency can be reduced with minimal sacrifice of write latency.

In graph 900 of FIG. 9, a comparison of write delay behavior is shown between a STT-MRAM device and a SHE-MRAM as the percentage of thermal stability indicated along the horizontal axis increases. It can be seen that the SHE-MRAM device maintains a lower write delay over a greater range of thermal stability than the STT-MRAM device. In graph 901 of fig. 9, the read current for a SHE-MRAM device is shown with the range of thermal stability shown along the horizontal axis. In graph 902 of FIG. 9, the read delay of the SHE-MRAM device is shown with the range of thermal stability shown along the horizontal axis. The read current performance and read delay performance may be affected by the material selection, material purity, and material composition of the underlying layers of the SHE-MRAM structure.

Table 903 in FIG. 9 shows the performance of the SHE-MRAM employed in a level 2(L2) cache structure. In Table 903, a comparison is made between various types of memory cell structures, such as CMOS (SRAM), STT-MRAM, and SHE-MRAM structures. The SHE-MRAM device provides similar read delay, but lower leakage and denser area utilization than the SRAM device. The SHE-MRAM device performs better than STT-MRAM devices, reducing bit cell failure rates compared to STT-MRAM devices. However, STT-MRAM devices and SHE-MRAM devices have similar footprints. Furthermore, higher Tunnel Magnetoresistance (TMR) can be employed in SHE-MRAM devices to reduce the read energy required for the CAM/TCAM structures discussed herein.

Turning now to an enhanced architecture for implementing Content Addressable Memory (CAM) and Ternary Content Addressable Memory (TCAM) devices, fig. 1 is presented. As described above, a CAM is a memory that allows for the lookup of stored data via input search data instead of data addresses (as is done with most random access memory devices). The CAM compares the incoming search data with the stored data table and returns the address of the matching data. This address can then be used to retrieve the data itself from memory. Various types of CAMs may be formed, one exemplary type of which is referred to as ternary CAM (tcam). Conventional CAMs require binary formatting of the input search data. However, TCAMs allow the use of a third state (tri-state) of "don't care" or wildcard for undefined portions of the input search data, so that there is no need to present an exact input search data instance to the TCAM to generate a resulting address. Fig. 1 shows an exemplary CAM system 100.

In fig. 1, input search data 120 of the system 100 is presented to a CAM circuit 110, which compares the input search data to a stored table of data contents of the CAM. If a match is found, the matching location 121 is presented as a result. The result may further be presented to the random access memory 111 for use as a lookup address. When presented to memory, the lookup address produces output data from random access memory 111 corresponding to input search data 120 and matching location 121. In the example of fig. 1, the communication network routing "ports" are used only to illustrate one common exemplary use of CAM. The output data 122 includes ports for routing communications based on input search data including network addresses, but any type of data may alternatively be used/stored.

Also shown in fig. 1 is a circuit 130 that includes a detailed conceptual view of the CAM. In this example, the input search data 120 includes an n-bit search word that is held in a search data register for presentation on a search line to a lookup table 131 having previously stored words 0 through w. Output encoder 132 presents one or more match locations 134, which may include one or more data addresses corresponding to storage locations associated with input search data 120. In circuit 130, this is shown as storing word w-1, which is presented on the associated match line. In some examples, the match line may be referred to as a match indicator line or an output line. The data address may be used to retrieve the target data from memory or for other more direct applications.

One exemplary application of the CAM/TCAM structure is an Artificial Neural Network (ANN). Conventional processing equipment as well as specialized circuitry may be used to form the artificial neural network. Exemplary processing devices include a Central Processing Unit (CPU) or Graphics Processing Unit (GPU) that is suitable for machine learning applications, such as image recognition, speech recognition, handwriting recognition, and other applications.

To perform machine learning operations, CPUs may be a limited choice due to the architectural design of most CPUs. For example, CPUs are good at processing a very complex instruction set very efficiently but lack parallelism. In machine learning calculations, especially training operations, the basic operation is vector matrix multiplication. GPUs, which have begun to gain favor over CPUs, use a parallel architecture and are adept at handling many parallel very simple instruction sets. Another emerging option is ASICs, such as Tensor Processing Units (TPU), which are adept at performing a particular task. As machine learning is increasingly integrated into everyday applications, there is an increasing interest in making these specialized chips for machine learning tasks and making existing processor-based designs more efficient.

In addition to data processing speed, another problem in machine learning and neural networks is power consumption. The machine learning task may cost the GPU or TPU up to several hundred watts to perform, in contrast to the human brain, which performs similar cognitive tasks by using only about 20 watts. This high power consumption disadvantage has motivated research into methods of biological or brain elicitation (such as neuromorphic calculations) to address machine learning power consumption issues. As will be discussed below, another approach is taken to reduce power consumption when using a GPU or other processing device. This enhanced approach uses a new structure of Content Addressable Memory (CAM) devices to supplement the operation of GPUs in neural networks.

As described above, an ANN such as a Convolutional Neural Network (CNN) may be implemented using a multi-core processor formed by GPUs. In GPU architectures, a Floating Point Unit (FPU), which includes an Adder (ADD) element, a Multiplier (MUL) element, and a multiply accumulator element in a streaming core that processes data, consumes a large amount of energy. The enhanced CAM/TCAM example herein may be used in an ANN using a processor (such as a GPU). In particular, a Content Addressable Memory (CAM) may be coupled to a processing pipeline in a GPU or other processor. The CAM may be used to store high frequency patterns encountered by the CNN. These high frequency patterns can be efficiently searched via search terms or search data, which can provide significant computational reduction and power savings in CNNs.

Fig. 2 illustrates an exemplary system 200 for applying CAM to neural network embodiments, which may provide enhanced operation of a GPU-implemented neural network or machine learning task processor. System 200 includes ternary cam (tcam)221 and memory 222, which form associative memory 220 to supplement the operation of a neural network formed using Floating Point Units (FPUs) 210 in a corresponding GPU. Integrating the TCAM221 next to each FPU in the GPU streaming core provides computational reduction and power savings. In particular, associative memory 220 is provided for finding high frequency modes using input data to TCAM 221. This configuration can provide significant computational reduction and power savings. As described above, a GPU is a general form of GPU that is used to process any data task in addition to just graphics processing. This type of GPU may be referred to herein as a General Purpose Graphics Processing Unit (GPGPU).

In FIG. 2, input 201 is presented to a pipeline corresponding to FPU stages 211 and 215 of FPU 210 within a GPGPU. Floating point result (Q)_FPU) From processing input through the FPU pipeline. Input 201 is also simultaneously presented to TCAM221 as input search data. When the search is successful in TCAM221 (e.g., a search hit), then hit indicator signal 223 is presented to pipeline control circuit 216. Pipeline control circuit 216 may include a clock circuit that provides a clock signal to FPU stage 211 and 215. The hit indicator signal 223 indicates to the clock circuit of the pipeline control circuit 216 to gate or otherwise disable the clock signal to the FPU stage 211 and 215. Once a hit results in a data output from the memory associated with TCAM221, a corresponding result (Q) is provided_AM) As output of pipeline instead of Q_FPUAnd (6) obtaining the result. Multiplexer 217 is controlled by hit indicator signal 223 of associative memory 220, which may be provided to be controlled by Q_FPUAnd Q_AMA selection is made among the presented results.

As can be seen in FIG. 2, when the input is contained in associative memory 220, then the input need not be pipelined by the FPU. Significant power savings may be achieved by disabling the FPU pipeline in these cases using the hit indicator signal 223 and clock circuit control. In addition to clock control circuitry, other ways of controlling the FPU pipeline may be implemented, such as power control gating of FPU pipeline circuit elements, logic disabling of FPU pipeline circuits, or other techniques. However, when the input operands are not contained in associative memory 220, then the FPU pipeline may be enabled to process the input to produce a result. The selection of either the TCAM based result or the FPU based result is based on whether a match in the TCAM is indicated.

Associative memory 220 is updated to hold frequent results from processing input operand data through the FPU pipeline. The updating may be performed based on various criteria, such as when the results are similar to previous results, or using each result during an initialization period until the associative memory is filled to capacity. Subsequent hits may indicate that the resulting data is allowed to remain in the associative memory, and data having several hits associated therewith may be replaced with new results provided by the FPU pipeline. Since TCAM221 may indicate a successful hit based on the tri-state formatted data, there is no need to present an exact match as an input operand to produce a hit. In machine learning applications and other neural network applications, this match/hit may be sufficient to produce a result and eliminate power consumption of the FPU pipeline stage.

Fig. 3 presents a schematic diagram 300 of a content addressable memory 310. The schematic has rows of cells (C)311-323, which include CAM or TCAM cells and are coupled in rows to a plurality of search lines (SL0-SL2) and a logical complement search line

When the cells of CAM 310 include TCAM cells, the plurality of TCAM cells are thus arranged in columns and rows, with the TCAM cells of each column coupled via an associated search line and the TCAM cells of each row coupled via an associated match line. When search data is presented on a search line via search data register/driver 330, the results may be monitored via row Match Lines (ML) fed to sense

amplifiers

341 and 344. The encoder 350 may present the result based on which match line generated a hit. The result includes a matching location 302 or matching address.

Control circuitry 360 is configured to write data into CAM 310, present the input data as search word 301 to search CAM 310, and read out match location 302. Control circuitry 360 may include discrete control logic, integrated control logic, processing devices, firmware, software, or other control elements that may be used to control the operation of the circuitry presented in fig. 3, as well as those seen below in fig. 5. The control circuit 360 may be integrated with one or more instances of the circuits in fig. 3 and 5. For example, when the circuit in fig. 5 is used to form a TCAM cell array, then the control circuit 360 may be coupled to shared control lines for TCAM cells arranged in rows and columns. In particular, search lines, write lines, control lines, match lines, etc. may be coupled to control elements of the control circuit 360. When used in a neural network circuit, the TCAM array may be used to speed up operation of the neural network and reduce power consumption of the neural network when a match of input data is found in the TCAM array, as shown in fig. 2.

To implement a TCAM, various methods can be employed. Referring to fig. 3, each "cell" (C) component in the content addressable memory 310 may be formed of various memory structures. Typically, these memory structures include non-volatile memory (NVM) elements and control logic. One such NVM element is a spin-orbit torque (SOT) MTJ element with the underlying Spin Hall Effect (SHE) material. Before discussing the enhanced TCAM cell structure, we first briefly introduce the structure of a spin-orbit torque (SOT) MTJ cell. The nonvolatile memory element including the SOT MTJ element may be referred to as a Spin Hall Effect (SHE) Magnetoresistive Random Access Memory (MRAM). Fig. 4 illustrates an exemplary SHE-MRAM architecture 400.

In fig. 4, a control circuit, not shown, controls the direction of charge movement through the material layer 431 of the Spin Hall Metal (SHM) element 430. According to the spin hall effect, oppositely oriented spins can accumulate on opposing surfaces of the charge carrying material. When paired with an MTJ element, such as shown for MTJ 420 in fig. 4, the free layer 423 of MTJ element 420 mates with the underlying charge carrying material 431. When a current flows through the material 431, the current may thus change the magnetization state of the free layer 423 of the MTJ 420 according to the flow of charge through the material 431. The magnetization state of the free layer 423 of the MTJ 420 is changed relative to the fixed or pinned layer 421, which remains in the same magnetic state. The MTJ 420 also includes a tunneling layer 422 that forms an insulating layer between the free layer 423 and the pinned layer 421.

Also shown in fig. 4 are the peripheral structures used to form the MRAM configuration that allows some data to be written and read through the change in the magnetization state of the MTJ element 420. These structures include a read control transistor (440) and a write control transistor (441), as well as various control lines. In addition, the structure 400 includes a Spin Hall Metal (SHM) element 430 that includes two terminals 432 and 433 and a spin Hall effect material 431.

The operation of the SHE-MRAM architecture shown in fig. 4 may be performed in accordance with the voltages presented in table 401. In operation, the first control transistor 440 or read switching element controls the read path and is coupled to the RWL line 411 of FIG. 4 via a gate terminal. The second control transistor 441 or the write switch element controls a write path and is coupled to the WWL line 412 of fig. 4 via a gate terminal. During a write operation, when the WWL line 412 is enabled and the corresponding write control transistor 441 is in an "on" state, then a voltage present between the Search Line (SL)414 and the bit line (Bl)413 introduces a corresponding current through the material 431, which may change the magnetoresistance or magnetization state of the free layer 423 of the MTJ element 420. Based on the direction of current flow between the SL line and the BL line, a logical "1" or "0" may be stored in the SHE-MRAM architecture 400. During a read operation, when RWL line 411 is enabled and corresponding read control transistor 440 is in an "on" state, a read voltage (V) is present on BL line 413_READ) May be used to detect the current magnetoresistance or magnetization state of the free layer 423 relative to the pinned layer 421 of the MTJ element 420.

Advantageously, the SHE-MRAM architecture in FIG. 4 provides a low current per unit of thermal stability (Δ) (i.e., I through underlying material 431)_CHANNEL) I.e. low I_CHANNELA/delta, with efficient spin generation per unit charge current (i.e. I)_SPIN/I_CHARGE>100%). exemplary materials for material 431 include Spin Hall Metal (SHM) materials, including tungsten (W) and tantalum (Ta), among others, which may affect the thermal stability of the SHE-MRAM structure in some examples, a beta (β) configuration of W and Ta is employed, such as β -W (beta-tungsten or β -tungsten) and β -Ta (beta-tantalum or β -tantalum) materials. Furthermore, the SHE-MRAM architecture provides separate read and write paths, allowing for longer device lifetimes. In particular, the read operation in the SHE-MRAM architecture described above requires less current than other MRAM or CMOS architectures, which may result in longer structure durations.

We turn now to an exemplary implementation of a TCAM cell based on a SHE-MRAM device. FIG. 5 shows an enhanced CAM/TCAM cell structure with SHE (SOT) -MRAM elements. In particular, fig. 5 shows a circuit 500 including a Ternary Content Addressable Memory (TCAM) based on a six transistor (6T) Spin Hall Effect (SHE) Magnetoresistive Random Access Memory (MRAM). Two SHE-MRAM structures 540 and 541 are included in circuit 500. Each SHE-MRAM structure includes a spin-orbit-torque (SOT) MTJ configuration that includes an MTJ (542,543) coupled to an underlying Spin Hall Metal (SHM) layer (544, 545). The layer of each SOT MTJ coupled to the corresponding SHM layer is the free layer of the MTJ, while the tunnel and pinned layers follow a path in the MTJ stack up from the free layer. A detailed view of the layers is shown previously in fig. 4, and the view shown in fig. 5 is merely schematic.

As described above, circuit 500 includes a TCAM based on 6T SHE-MRAM. The six transistors include M1-M6. M1 and M2 include read control transistors 531 and 532. M3 and M4 include write control transistors 533 and 534. M5 includes a bias control transistor 535. M6 includes a voltage keeper configuration formed by transistor 536. In addition, M0 includes a precharge control transistor 530. In this example, M1, M2, M3, M4, and M6 comprise negative channel metal oxide semiconductor (NMOS) transistors, while M0 and M5 comprise positive channel metal oxide semiconductor (PMOS) transistors. Other suitable switching elements or selection means may alternatively be used for M1, M2, M3, M4 and M6.

It should be noted that the "B" symbol used on the control lines in fig. 5 indicates a logic complement or logic complement version of the companion signal. For example, row SLB (517) is a logically complementary version of row SL (516). A complementary version refers to a logical negation or logical inversion of a particular signal or logical value. For example, when line SL is at a particular voltage level (such as V)_DDOr a logic "1"), then line SLB is at the supplemental voltage level (i.e., 0V or a logic "0").

Turning now to the structure of circuit 500, read control transistors 531, 532(M1, M2) are included to control the read current through each SHM layer (544,545) of the SHE-MRAM architecture. The read control transistors 531, 532 are controlled by SL and SLB lines (516,517) that include search lines for the circuit 500. The read control transistors 531, 532 also have smaller relative feature sizes compared to the STT MTJ structure discussed above due to the smaller read current employed for the SOTMTJ structure. Write control transistors 533, 534(M3, M4) are included to control the write current through each SHM layer (544,545) of the SHE-MRAM architecture. The write control transistors 533, 534 are controlled by a WL1 line and a WL2 line (514,515), the WL1 line and the WL2 line comprising the write lines for the circuit 500. Thus, separate read and write paths are implemented using read and write lines. The write control transistors 533, 534 also have smaller relative dimensions compared to the STT MTJ structure due to the larger Tunnel Magnetoresistance (TMR) characteristics of the SOT MTJ configuration.

Additional control elements are included in the TCAM cell structure of fig. 5. First, precharge transistor 530(M0) includes a transistor configured to pull ML511 to a predetermined voltage level (such as V) prior to a read/search operation_DD) The precharge component of (1). The precharge transistor 530 couples the corresponding match line 511 to a predetermined voltage in response to a precharge control signal (PC). A bias transistor 535(M5) is included to bias a voltage onto a common node 523 shared between the read transistors 531, 532 of the TCAM cell structure. The BIAS transistor 535 is configured to provide a BIAS voltage to the drain terminals of the read control transistor 531 and the read control transistor 532 in response to a BIAS control signal (BIAS). The bias voltage typically has a value below V_DDSuch as zero volts or the threshold voltage (V) of the field effect transistor for the PMOS transistor shown as M5_TH)。

Another element 536(M6) is included as a voltage keeper for ML 511. In this example, the voltage keeper element 536 comprises a diode-connected NMOS transistor. When many TCAM cells are included in an array (along with TCAM cells included in circuit 500), the function of the voltage keeper is to keep enough ML511 voltage for parallel search operationsAnd (4) swinging. The voltage keeper element 536 couples the drain terminals of the read control transistor 531 and the read control transistor 532 to corresponding match lines 511. Fig. 3 shows an exemplary array having a number of TCAM cells. During a search operation, the ML511 is continuously discharged through the voltage keeper 536 in the TCAM cell that does not match the search input in response to the search input presented to the TCAM cell on the SL/SLB lines 516, 517. When ML511 is discharged, each voltage keeper 536 of the TCAM cell array becomes a sub-threshold region, and the voltage across ML511 changes (Δ V)_ML) Slightly reduced in a logarithmic scale (along with the word length of the search input). However, due to voltage keeper 536, ML511 maintains a sufficient voltage swing for sensing beyond 0.1V. Thus, a match in the TCAM cell causes ML511 to discharge more slowly than a mismatch, and ML511 also remains at the precharge level in the case of a match, which is longer in duration than the case of a mismatch.

Six transistors are employed in the exemplary SHE-MRAM device of fig. 5 to implement separate read and write paths. Thus, for a comparable four transistor STT-MRAM device, a larger structure footprint may be assumed. However, the read control transistors 531, 532 in the SHE-MRAM device are correspondingly smaller, and the SOT MTJ structure in fig. 5 has a relatively smaller footprint than the STT MTJ structure. Thus, even though additional transistor structures may be employed in the SHE-MRAM device to implement separate read and write paths, the six-transistor SHE-MRAM is comparable in size to the four-transistor STT-MRAM structure, with the additional advantages described above for the separate read/write paths, as well as enhanced performance and enhanced thermal stability.

As described above, the circuit 500 implementing the exemplary enhanced TCAM cell structure employs two Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) elements with an underlying Spin Hall Effect (SHE) layer to form two SHE-MRAM structures. A first Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) element 542 has a pinned layer coupled to a first search control transistor 531 controlled by a first search line 517. The first SOT MTJ element 542 has a Spin Hall Effect (SHE) layer that includes a SHM 544. The second SOTMTJ element 543 has a pinned layer coupled to a second search control transistor 532 controlled by a second search line 516. The second SOT MTJ element 543 also has a SHE layer comprising a SHM 545. The first search control transistor 531 and the second search control transistor 532 may be referred to as read control transistors.

The SHM 544 is coupled to supplemental write inputs 512,513 (BL, BLB) in a first configuration. As used herein, a first configuration refers to a particular arrangement and set of connections between supplemental write inputs 512,513 (BL, BLB) and SHM 544. The supplemental write inputs 512,513 (BL, BLB) and SHM 544 may be arranged and connected in different configurations that remain within the scope of the claims of this disclosure. Further, in certain embodiments, the first configuration including supplemental write inputs 512,513 (BL, BLB) and SHM 544 may also include additional components, such as a write control transistor 533 configured to control when a write current or voltage is applied to the SHM 544 component. The first configuration couples the corresponding SHE layer according to a first voltage polarity with respect to the supplemental write inputs 512,513 (BL, BLB). In this example, the write control transistor 533 applies the supplemental write inputs 512,513 with the SHM 544 in a selected configuration to establish a voltage across the SHM 544 according to a desired current direction. The supplemental write inputs 512,513 carry logically compensated signal levels during a write operation to the SHM 544. The signal levels of the logic complement each have a corresponding voltage that represents opposite logic states, such as "0" and "1" in other logic representations. Lines 518, 520, and 522 also couple write inputs 512,513 to SHM 544 through write control transistor 533. The write control transistor 533 is controlled by the first write control line 514(WL 1).

The SHM545 is coupled to the supplemental write inputs 512,513 (BL, BLB) in a second configuration, also referred to as a second configuration. As used herein, the second configuration refers to a particular arrangement and set of connections between the supplemental write inputs 512,513 (BL, BLB) and the SHM 545. The supplemental write inputs 512,513 (BL, BLB) and SHM545 may be arranged and connected in different configurations that remain within the scope of the claims of the present disclosure. Further, in some embodiments, the second configuration including supplemental write inputs 512,513 (BL, BLB) and SHM545 may also include additional components, such as a write control transistor 534 configured to control when a write current or voltage is applied to the SHM 545. The second configuration couples the corresponding SHE layer according to a second voltage polarity with respect to the supplemental write inputs 512,513 (BL, BLB). In this example, the write control transistor 534 applies the supplemental write inputs 512,513 with the SHM545 in the selected configuration to establish a voltage across the SHM545 according to the desired current direction. The supplemental write inputs 512,513 carry logically compensated signal levels during write operations to the SHM 545. The signal levels of the logic complement each have a corresponding voltage that represents opposite logic states, such as "0" and "1" in other logic representations. Lines 519, 521, and 522 also couple write inputs 512,513 to the SHM545 through write control transistor 534. The write control transistor 534 is controlled by a second write control line 515(WL 2).

The operation of the SHE-MRAM device based TCAM circuit 500 is detailed in FIG. 6. When deployed in an array, many SHE-MRAM based TCAM cells may be arranged to share control lines, such as search-line (SL) and match-line (ML) as seen in the example of fig. 3. In this configuration, the TCAM cell array is arranged in various rows and columns.

The data value may be written into the TCAM prior to a search/read operation of the TCAM. Write control transistors 533, 534(M3, M4) control the current through the corresponding SHM layers 544,545 for a write operation to the MTJ elements 542, 543. In an exemplary write operation, the WL lines 514,515 (WL1/WL2) may be enabled individually, and the corresponding write control transistors (M3, M4) are in an "on" state. The voltage present between the BL and BLB lines then introduces a corresponding current through the underlying SHM material, which can change the magnetoresistance/magnetization state of the free layer of the corresponding MTJ element. The magnetoresistive states of the MTJ elements 542,543 can be in a parallel or anti-parallel state. The parallel MTJ states indicate that the pinned and free layers have the same magnetization state. The anti-parallel MTJ state indicates that the pinned and free layers have different magnetization states. Accordingly, a logical "1" or "0" may be stored in the MTJ element based on the direction or polarity of the current between the BL/BLB lines. In this example, a logic "1" corresponds to the parallel MTJ state and a logic "0" corresponds to the anti-parallel MTJ state, but other configurations are possible.

Wl1 is configured to control write control transistor 533(M3) and write data into the left MTJ structure, where the data is based on the current between the BL/BLB lines. WL2 is configured to control write control transistor 534(M4) and write data into the right MTJ structure based on the current between the BL/BLB lines. WL1 and WL2 are enabled simultaneously or in a sequential manner depending on the data to be written, in part because the BL/BLB lines are shared. FIG. 5 above shows exemplary write values corresponding to the BL/BLB line in the truth table.

The write process for each of the SHE-MRAM structures 540-541 may write a logical "1" or "0" to the corresponding SOT MTJ element 542, depending on the direction or polarity of the current through the corresponding SHM layer 544, 545. Table 600 of fig. 6 shows write voltages to implement a logic "1" or "0" in the SOT MTJ elements 542, 542. For the first writing case (MTJ542 equals 0, MTJ543 equals 1), the control circuit may apply BL equals V_DDBLB is GND, and WL1 is WL2 is V_DD(i.e., M3/M4 are both on). Thus, this first write case writes to both MTJ542 and MTJ 543. For the second writing case (MTJ542 equals 1, MTJ543 equals 0), the control circuit may apply BL equals GND and BLB equals V_DDWhile still applying WL 1-WL 2-V_DD. This second write case also writes to both MTJ542 and MTJ 543. However, for the third case (MTJ542 being equal to 0, MTJ543 being equal to 0), the control circuit needs to sequentially write the SOT MTJ elements 542, 542. For example, the control circuit may apply BL ═ V_DDBLB-GND to write "0" into MTJ542 first, while WL 1-V is applied_DDWL2 is GND (i.e., M3 is on and M4 is off). The control circuit may then switch to applying BL ═ GND and BLB ═ V_DDTo write "0" into MTJ543, while applying WL 1-GND and WL 2-V_DD(i.e., M3 off, M4 on).

In fig. 6, the SHE-MRAM based TCAM cell provides two steps of search/read operations, namely a precharge phase and an evaluation phase. In the precharge phase, the read control transistors (M1, M2) are turned off via the SL/SLB lines, and the Match Line (ML)511 is turned off by using prechargeAn electrical control signal (PC) to enable precharge transistor 530 to charge to V_DD. Once precharged, precharge transistor 530 is disabled. Then, in the evaluation phase, the MRAM cells 540, 541 are activated via the read control transistors M1, M2, and the voltage on ML511 is gradually pulled down by the MRAM cells 540, 541. The logical state of MRAM cells 540, 541, which is stored as the magnetoresistive state (i.e., magnetization state) of MTJ elements 542,543, determines the discharge rate of the voltage across ML511, indicating the logical value. For example, when the search data present on the SL/SLB line is equal to the data stored in the MTJ elements 542,543, then the discharge of ML511 is relatively slow such that ML511 remains at a relatively high voltage level, representing a logic "1" indicating a match. When the search data present on the SL/SLB line is not equal to the data stored in MTJ elements 542,543, then the discharge of ML511 is relatively fast so that ML511 does not remain at a relatively high voltage level, representing a logic "0" indicating a mismatch.

Table 600 shown in fig. 6 further shows the correspondence among stored data, search lines, and match lines in the MTJ elements 542 and 543. The magnetoresistive state of the MTJ elements 542,543 can represent a "1" or "0" binary value. Tri-state ("X" value) is achieved by the structure of the two MTJ elements and the associated control transistors/interconnects. SL "search" values are shown in table 600, which may correspond to these magnetoresistive states. When the SL values of table 600 are presented on a search line (SL/SLB) of the TCAM array, then an output may be indicated on a match line of the TCAM array. The output indicates a match or no match response to the search line value.

Control circuitry (such as discrete control logic, integrated control logic, processing devices, firmware, software, or other control elements) may be used to control the operation of the circuitry presented in fig. 5. The control circuit may be integrated with one or more instances of the circuit in fig. 5. For example, when the circuit in fig. 5 is used to form a TCAM cell array, then the control circuit may be coupled to shared control lines for TCAM cells arranged in rows and columns. Specifically, the SL/SLB, WL1/WL2, and BL/BLB lines may be coupled to an input driver circuit as shown in FIG. 3. The Match Line (ML) may be coupled to a sense amplifier and further output circuitry for presenting a data match status to further circuitry or processing equipment. When used in a neural network circuit, the TCAM array may be used to speed up operation of the neural network and reduce power consumption of the neural network when a match of input data is found in the TCAM array, as shown in fig. 2.

In another exemplary circuit that may be used as a TCAM cell, a first Magnetoresistive Random Access Memory (MRAM) structure includes a first Magnetic Tunnel Junction (MTJ) element coupled to a first Spin Hall Metal (SHM) layer at a corresponding free layer. The first MTJ element is coupled at a corresponding pinned layer to a first read control transistor (M1) controlled by a first Search Line (SLB). The first SHM layer includes a first terminal coupled to a first write control transistor (M3) controlled by a first write line (WL1), and a second terminal coupled to a Bit Line (BL). The second MRAM structure includes a second MTJ element coupled to the second SHM layer at a corresponding free layer. The second MTJ element is coupled at a corresponding pinned layer to a second read control transistor (M2) controlled by a second Search Line (SL). The second SHM layer includes a first terminal coupled to a second write control transistor (M4) controlled by a second write line (WL2), and a second terminal coupled to a Bit Line (BL). The circuit may include a bias transistor (M5), the bias transistor (M5) configured to provide a bias voltage to the first read control transistor (M1) and the second read control transistor (M2). When used in an array having more than one TCAM cell sharing ML, the circuit may include a voltage keeper element (M6) coupled between the match line and the bias voltage, and a precharge element (M0) coupled to the Match Line (ML).

Fig. 7 presents additional exemplary operations of the circuits discussed above (such as the circuit of fig. 5). In fig. 7, the write process 700 is discussed in operation 701-702, and the search or read process 710 is discussed in operation 703-705. Exemplary detailed operations are also presented for some of the operations of fig. 7 to provide further details regarding each relevant step.

In the write process 700, the control circuit writes data into a first Spin Orbit Torque (SOT) Magnetic Tunnel Junction (MTJ) element of a TCAM cell. With respect to the elements of fig. 5, the control circuitry may write (701) data into the first MRAM architecture (540) by: at least the first write control transistor 533(M3) is enabled and a write voltage level is presented between a first terminal (520) of the first SHM 544 and a second terminal (522) of the first SHM 544 to establish a current through the first SHM 544 that changes the magnetization state of the first MTJ element 542 in accordance with the data. The control circuit may write (702) data into the second MRAM architecture (541) by: at least the second write control transistor 534(M4) is enabled and a write voltage level is presented between a first terminal (521) of the second SHM545 and a second terminal (522) of the second SHM545 to establish a current through the second SHM545 that changes the magnetization state of the second MTJ element 543 according to the complementary version of the data. The write voltage is introduced into the circuit between lines BLB and BL (512,513) in FIG. 5.

In a read process 710, the control circuit reads from the TCAM cell. Typically, many TCAM cells are placed in an array configuration, and read operations span many TCAM cells in parallel. This read operation is commonly referred to as a search operation. With respect to the elements of FIG. 5, the control circuit may first precharge (703) the match indicator line (i.e., Match Line (ML)511) to a predetermined voltage. The precharging is performed by disabling at least the first read control transistor 531(M1) and the second read control transistor 532(M2), and then bringing the ML511 to a predetermined voltage through the precharge element 530 (M0). The precharge element or transistor 530 is controlled using a PC control line 527.

After precharging the ML511, a search or read process can evaluate (704) the data match state within the TCAM cell by evaluating the input data against data previously stored in the first MRAM architecture 540 and the second MRAM architecture 541. The data match status indicates the result of the TCAM cell evaluation based on the search data or read data transmitted onto the search line or read line. The data match state may indicate a two-state match, a two-state mismatch, a three-state match, a three-state mismatch, or other results indicating how the data stored in the TCAM cell is compared to the search data. This evaluation is performed by enabling at least the first read control transistor 531(M1) and the second read control transistor 532(M2) to responsively output (705) the resulting voltage on ML511 that represents the data match state when supplemental search data is presented on the first search line 517(SLB) and the second search line 516 (SL). In part because the voltage keeper element 536(M6) allows the M1 and M2 to sink current from the ML511 via the common node 523, the resulting voltage on the ML511 comprises a predetermined voltage when a match is determined between the search data and the data stored in the first SOT MTJ542 and the second SOT MTJ 543. When a mismatch is determined between the search data and the data stored in the first SOT MTJ542 and the second SOT MTJ543, the resulting voltage includes a voltage level lower than a predetermined voltage. Specifically, the match state will discharge ML511 at a slower rate than the mismatch state, and thus, the match allows the precharge voltage on ML511 to persist at a specified threshold that is longer than the mismatch.

As discussed herein, a TCAM based on SHE-MRAM devices has several advantages over STT-MRAM TCAMs. For example, TCAMs based on SHE-MRAM devices have faster searching by employing larger read currents than TCAMs based on STT-MRAM devices. The TCAM based SHE-MRAM devices are more robust to data retention failures, in part because of the separate read/write paths enabled by the SOT-MRAM elements. The SOT-MRAM elements also enable lower write currents compared to CMOS based TCAMs, while still maintaining near zero static power consumption and compact bit cell size. The manufacturing area overhead of the elements of the SHE-MRAM device based TCAM circuit detailed in FIG. 5 is also advantageous. For example, the M1, M2 transistors may be smaller than transistors in STT or CMOS based TCAMs, in part because M1, M2 are only used for read operations. Even if an additional two write control transistors (M3, M4) are employed in the SHE-MRAM device based TCAM, these transistors are relatively small due to the relatively low SOT-MRAM write current.

Thus, the SHE-MRAM configuration proposed herein can improve the read speed of TCAM cells with a larger read current by increasing thermal stability, since the corresponding write overhead is insignificant due to the high spin polarization efficiency. The TCAM based on SHE-MRAM devices can be successfully used in SOT-MRAM based artificial neural network engines, such as those discussed above in FIG. 2.

FIG. 8 illustrates a computing system 801 that represents any system or collection of systems in which the various operating architectures, scenarios, and processes disclosed herein can be implemented. For example, the computing system 801 may be used to implement the control circuitry of the elements of fig. 1, the control portions of the elements of fig. 2, and the FPU stage of fig. 2, the control circuitry 360 of fig. 3, the control circuitry of fig. 5, and other circuitry discussed herein. Further, computing system 801 may be used to store write data prior to storage to a TCAM unit and to store search results after the search process is completed. In another example, computing system 801 may configure interconnect circuitry to establish one or more arrays of TCAM cells or to connect TCAM cells into an artificial neural network circuit. In still other examples, computing system 801 may fully implement an artificial neural network, such as the artificial neural network shown in fig. 2, to create an at least partially software-implemented artificial neural network through an externally-implemented enhanced TCAM cell structure. Computing system 801 may implement control of any of the TCAM unit operations discussed herein, whether implemented using hardware or software components, or any combination thereof.

Examples of computing system 801 include, but are not limited to, computers, smart phones, tablet computing devices, notebook computers, desktop computers, hybrid computers, rack-mounted servers, web servers, cloud computing platforms, cloud computing systems, distributed computing systems, software-defined networking systems and data center devices, and any other type of physical or virtual machine, and other computing systems and devices, and any variations or combinations thereof.

Computing system 801 may be implemented as a single apparatus, system, or device, or may be implemented in a distributed fashion as multiple apparatuses, systems, or devices. Computing system 801 includes, but is not limited to, processing system 802, storage system 803, software 805, communication interface system 807, and user interface system 808. The processing system 802 is operatively coupled to a storage system 803, a communication interface system 807, and a user interface system 808.

Processing system 802 loads and executes software 805 from storage system 803. Software 805 includes a TCAM control environment 820 that represents the processes discussed with respect to the previous figures. When executed by processing system 802 to implement and enhance TCAM operations or ANN operations, software 805 instructs processing system 802 to operate as described herein for at least the various processes, operating conditions, and sequences discussed in the foregoing embodiments. Computing system 801 may optionally include additional devices, features, or functionality not discussed for the sake of brevity.

Still referring to fig. 8, the processing system 802 may include a microprocessor and processing circuitry that retrieves and executes software 805 from the storage system 803. Processing system 802 may be implemented in a single processing device, but may also be distributed across multiple processing devices, subsystems, or dedicated circuits that cooperate in executing program instructions and performing operations discussed herein. Examples of processing system 802 include a general purpose central processing unit, a special purpose processor, and a logic device, as well as any other type of processing device, combination, or variation thereof.

Storage system 803 may include any computer-readable storage medium readable by processing system 802 and capable of storing software 805 and, optionally, TCAM input/output values 810. Storage system 803 may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules or other data. Examples of storage media include random access memory, read-only memory, magnetic disks, optical disks, flash memory, virtual and non-virtual memory, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, resistive storage devices, magnetic random access memory devices, phase change memory devices, or any other suitable non-transitory storage media.

In addition to computer-readable storage media, in some embodiments, storage system 803 may also include computer-readable communication media through which at least some software 805 may be communicated, internally or externally. Storage system 803 may be implemented as a single storage device, but may also be implemented across multiple storage devices or subsystems that are co-located or distributed with respect to each other. The storage system 803 may include additional elements, such as a controller, capable of communicating with the processing system 802, or possibly other systems.

The software 805 may be implemented with program instructions and, among other functions, when executed by the processing system 802, direct the processing system 802 to operate as described with respect to the various operating conditions, sequences, and procedures shown herein. For example, software 805 may include program instructions for controlling and interfacing with enhanced TCAM circuitry, as well as other operations.

In particular, the program instructions may include various components or modules that cooperate or otherwise interact to perform the various processes and operational scenarios described herein. The various components or modules may be embodied in compiled or interpreted instructions, or in some other variation or combination of instructions. The various components or modules may execute in a synchronous or asynchronous manner, in serial or parallel, in a single threaded environment or in multiple threads, or according to any other suitable execution paradigm, variant, or combination thereof. In addition to or including the TCAM control environment 820, the software 805 may include additional processes, programs, or components, such as operating system software or other application software. Software 805 may also include firmware or some other form of machine-readable processing instructions executable by processing system 802.

In general, the software 805, when loaded into the processing system 802 and executed, can transform an appropriate apparatus, system, or device in its entirety (represented by computing system 801) from a general-purpose computing system into a special-purpose computing system customized to facilitate controlling and interfacing with the enhanced TCAM circuitry. In practice, the coding software 805 on the storage system 803 may transform the physical structure of the storage system 803. The particular transformation of physical structure may depend on various factors in different implementations of the description. Examples of such factors may include, but are not limited to, the technology of the storage media used to implement storage system 803, and whether the computer storage media is characterized as primary or secondary storage, among other factors.

For example, if the computer-readable storage medium is implemented as a semiconductor-based memory, the software 805 may transform the physical state of the semiconductor memory when the program instructions are encoded therein, such as by transforming the state of transistors, capacitors, or other discrete circuit elements that make up the semiconductor memory. Similar transformations may occur with respect to magnetic media or optical media. Other transformations of physical media are possible without departing from the scope of the present description, with the foregoing examples provided only to facilitate this discussion.

The TCAM control environment 820 includes one or more software elements, such as an OS 821 and applications 822. These elements may describe portions of the computing system 801 with which elements of a TCAM array, TCAM unit, control system, artificial neural network, or external system may interface or interact. For example, the OS 821 may provide a software platform on which to execute the application programs 822 and allow enhanced TCAM operation and control.

In one example, the TCAM control environment 820 includes a TCAM control 823. TCAM control 823 may include TCAM read/write (R/W) services 824 and GPU TCAM services 825. The TCAM R/W service 824 can control over corresponding write control lines and write data lines (such as

WL1/WL2 and BL/BLB in fig. 5) writes data into a TCAM cell or array. The TCAMR/W service 824 can control the enabling/disabling of the write control transistors in the proper ordering to properly write data into the TCAM cells and TCAM array. The TCAM R/W service 824 may also perform search or read operations. The TCAM R/W service 824 can control the search for matches to data previously written into a TCAM cell or TCAM array, such as the PC, BIAS, and SL/SLB lines in FIG. 5. The TCAMR/W service 824 can read results presented on Match Lines (ML). In some examples, the TCAM R/W service 824 may implement output encoder/decoder or multiplexer logic to assemble discrete search result values into a bit vector or process multiple search match outputs generated by the TCAM. TCAM R/W service 824 may transmit the resulting search match/no match indication to one or more other systems via communication interface 807 or present the resulting search match/no match indication to one or more users via user interface system 808.

When a TCAM structure is employed in a GPU-implemented ANN or CNN, GPU TCAM services 825 may include control functions. GPU TCAM service 825 may present input to a GPU or GPGPU based ANN. These inputs may be presented simultaneously to the TCAM structure as input search data. The GPU TCAM service 825 may control when to write data to the associative memory that includes a TCAM based on frequent results determined by the associated ANN/CNN. GPUTCAM service 825 may control the enabling or disabling of pipeline stages in the GPU/FPU pipeline when a TCAM hit is encountered using the input data. The GPU TCAM service 825 may implement output circuitry, such as multiplexing circuitry, to select among the results presented by the FPU stage or TCAM structure based on TCAM hits encountered using the input data.

The communication interface system 807 may include communication connections and devices that allow communication with other computing systems (not shown) over a communication network (not shown). The communication interface system 807 may also communicate with portions of the hardware implemented ANN, such as with layers of the ANN, or TCAM structures and circuits. Examples of connections and devices that together allow for inter-system communication may include NVM memory interfaces, network interface cards, antennas, power amplifiers, RF circuitry, transceivers, and other communication circuitry. The connections and devices may communicate over a communication medium, such as metal, glass, air, or any other suitable communication medium, to exchange communications or data with other computing systems or system networks.

The user interface system 808 is optional and may include a keyboard, mouse, voice input device, touch input device for receiving input from a user. Output devices such as displays, speakers, web interfaces, terminal interfaces, and other types of output devices may also be included in user interface system 808. The user interface system 808 may provide output and receive input through a data interface or a network interface, such as the communication interface system 807. The user interface system 808 may also include associated user interface software executable by the processing system 802 to support the various user input and output devices discussed above. The user interface software and user interface devices may support graphical user interfaces, natural user interfaces, or any other type of user interface, alone or in combination with each other and other hardware and software elements.

Communication between computing system 801 and other computing systems (not shown) may occur over one or more communication networks and according to various communication protocols, combinations of protocols, or variations thereof. Examples include an intranet, the internet, a local area network, a wide area network, a wireless network, a wired network, a virtual network, a software defined network, a data center bus, a computing backplane, or any other type of network, combination of networks, or variations thereof. The above communication networks and protocols are well known and need not be discussed in detail here. However, some communication protocols that may be used include, but are not limited to, internet protocol (IP, IPv4, IPv6, etc.), Transmission Control Protocol (TCP), and User Datagram Protocol (UDP), as well as any other suitable communication protocol, variation, or combination thereof.

The description and drawings are included to depict specific embodiments to teach those skilled in the art how to make and use the best mode. For the purpose of teaching inventive principles, some conventional aspects have been simplified or omitted. Those skilled in the art will appreciate variations from these embodiments that fall within the scope of the disclosure. Those skilled in the art will also appreciate that the above-described features may be combined in various ways to form multiple embodiments. Accordingly, the present invention is not limited to the specific embodiments described above, but only by the claims and their equivalents.

28页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：存储器控制器及其操作方法

Content addressable memory with spin orbit torque device

相关技术

网友询问留言